{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 基于多目标优化的视频推荐\n",
    "\n",
    "## 赛题简介\n",
    "\n",
    "推荐系统大多都是基于隐式反馈来做推荐，比如用户的点击、观看时长、评论、分享等，且不同隐式反馈表达了用户不同的喜好程度。如果仅仅以单目标对推荐结果进行衡量，会存在衡量不全面的问题。如视频场景，假设某个用户打开一个视频看了开头觉得不喜欢立马关掉，如果以点击为目标则体现的是用户感兴趣，但实际情况是用户对这个视频不感兴趣。从这个例子可以看出，在视频推荐中如果仅仅以点击为目标，可能忽视了用户更深层次的隐式反馈。因此，视频推荐除了关注用户点击，还需关注用户观看时长、分享等目标，期望通过多目标能更深入地挖掘用户兴趣，做更精准的推荐。\n",
    "\n",
    "https://developer.huawei.com/consumer/cn/activity/devStarAI/algo/competition.html#/preliminary/info/006/introduction\n",
    "\n",
    "## 赛题说明\n",
    "\n",
    "本赛题提供14天数据用于训练，1天数据用于测试，数据包括用户特征，视频内容特征，以及用户历史行为数据，选手基于给出的数据，提供推荐策略，目标是预测每位用户观看视频时长所在区间，且预测是否对视频进行分享。所提供的数据经过脱敏处理，保证数据安全。\n",
    "\n",
    "## 赛题类型\n",
    "\n",
    "- 评价指标：AUC加权和\n",
    "- 赛题类型：用户留存预测，结构化数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 赛题数据\n",
    "\n",
    "https://digix-algo-challenge.obs.cn-east-2.myhuaweicloud.com/2021/AI/4d83d37a6bc6e83d0da3af019c18775c/2021_3_data.zip\n",
    "\n",
    "提供的数据包括用户维度、视频维度以及用户历史行为数据，以下按照这三个维度分别说明。\n",
    "\n",
    "\n",
    "### 用户特征\n",
    "\n",
    "| 字段名称 | 说明 | 是否为空 | 类型 | 取值样例 |\n",
    "| :------- | :--- | :------- | :--- | :------- |\n",
    "| user_id     | 用户ID   | 否   | string | 1    |\n",
    "| age         | 年龄段   | 是   | string | 0    |\n",
    "| gender      | 性别     | 是   | string | 1    |\n",
    "| country     | 国家     | 是   | string | 0    |\n",
    "| province    | 所在省份 | 是   | string | 0    |\n",
    "| city        | 所在城市 | 是   | string | 0    |\n",
    "| city_level  | 城市级别 | 是   | string | 0    |\n",
    "| device_name | 设备类型 | 是   | string | 0    |\n",
    "\n",
    "\n",
    "### 视频内容特征\n",
    "\n",
    "| 字段名称 | 说明 | 是否为空 | 类型 | 取值样例 |\n",
    "| :------- | :--- | :------- | :--- | :------- |\n",
    "| video_id            | 视频ID           | 否   | string | 16451                                                        |\n",
    "| video_name          | 视频名称         | 是   | string | 天下无贼                                                     |\n",
    "| video_tags          | 视频标签         | 是   | string | 扒手,反扒,t,tx,txm,txmz,txw,txwz,天下无贼,刘德华,刘若英,王宝强,冯小刚,张晞临,芒果TV,院线,剧情,李冰冰,葛优,动作,犯罪 |\n",
    "| video_description   | 视频描述         | 是   | string | 王薄（刘德华 饰）和王丽（刘若英 饰）本是一对最佳贼拍档，但因怀了王薄的孩子，王丽决定收手赎罪，两人产生分歧。在火车站遇到刚刚从城市里挣了一笔钱准备回老家用它盖房子娶媳妇的农村娃子傻根（王宝强 饰）后，王丽被他的单纯打动，决定暗中保护不使他的辛苦钱失窃，王薄却寻思找合适机会下手，但 最终因为“夫妻情深”归入了王丽的阵营。 不料傻根的钱早被以黎叔（葛优 饰）为头目的另一著名扒窃团伙盯上，于是一系列围绕傻根书包里的钞票、在王薄、王丽和黎叔团伙之间展开的强强斗争上演。 |\n",
    "| video_release_date  | 年代             | 是   | string | 2004/12/9                                                    |\n",
    "| video_director_list | 导演名称         | 是   | string | 冯小刚,林黎胜                                                |\n",
    "| video_actor_list    | 演员名称         | 是   | string | 刘德华,葛优,刘若英,王宝强                                    |\n",
    "| video_score         | 评分             | 是   | string | 8.5                                                          |\n",
    "| video_second_class  | 视频二级分类名称 | 是   | string | 喜剧,剧情,警匪,动作,犯罪                                     |\n",
    "| video_duration      | 视频时长，单位秒 | 是   | string | 7246                                                         |\n",
    "\n",
    "### 用户历史行为\n",
    "\n",
    "| 字段名称 | 说明 | 是否为空 | 类型 | 取值样例 |\n",
    "| :------- | :--- | :------- | :--- | :------- |\n",
    "| user_id          | 用户id             | 否   | string | 1         |\n",
    "| video_id         | 视频id             | 否   | string | 16451     |\n",
    "| is_watch         | 是否播放           | 是   | int    | 1         |\n",
    "| is_share         | 是否分享           | 是   | int    | 0         |\n",
    "| is_collect       | 是否收藏           | 是   | int    | 0         |\n",
    "| is_comment       | 是否评论           | 是   | int    | 0         |\n",
    "| watch_start_time | 观看时间           | 是   | string | 2020-11-3 |\n",
    "| watch_label      | 播放标签           | 是   | int    | 8         |\n",
    "| pt_d             | 时间，用于数据分区 | 是   | string | 20201103  |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 评估方式\n",
    "\n",
    "预测每位用户对视频观看时长所在区间，且预测是否对视频进行分享。本次比赛使用 AUC（ROC曲线下面积）作为评估指标，AUC 越高，代表结果越优，排名越靠前。各指标的AUC采用加权求和。\n",
    "\n",
    "$$score = α*(0.1*AUC_{watch1}+ 0.2*AUC_{watch2}+0.3*AUC_{watch3}+0.4*AUC_{watch4}+0.5*AUC_{watch5}+0.6*AUC_{watch6}+0.7*AUC_{watch7} +0.8*AUC_{watch8} +0.9*AUC_{watch9} )+β*AUC_{share}$$\n",
    "\n",
    "初赛：α为0.7，β为0.3\n",
    "\n",
    "其中：视频观看时长、label、加权权重的对应关系如下。\n",
    "\n",
    "| 观看时长区间【左闭右开区间】 | watch_label | 加权权重 |\n",
    "| :--------------------------- | :---------- | :------- |\n",
    "| 0~10%总片长    | 0    | 0    |\n",
    "| 10~20%总片长   | 1    | 0.1  |\n",
    "| 20%~30%总片长  | 2    | 0.2  |\n",
    "| 30%~40%总片长  | 3    | 0.3  |\n",
    "| 40%~50%总片长  | 4    | 0.4  |\n",
    "| 50%~60%总片长  | 5    | 0.5  |\n",
    "| 60%~70%总片长  | 6    | 0.6  |\n",
    "| 70%~80%总片长  | 7    | 0.7  |\n",
    "| 80%~90%总片长  | 8    | 0.8  |\n",
    "| 90%~100%总片长 | 9    | 0.9  |\n",
    "\n",
    "## 提交方式\n",
    "\n",
    "选手提交结果为一个submission.csv 文件, 编码采用无BOM 的UTF-8，格式如下：user_id, video_id, watch_label, is_share。\n",
    "\n",
    "其中user_id对应测试样本中的user_id，video_id对应测试样本的video_id，watch_label表示user_id观看video_id时长所在的时长区间，is_share表示用户是否对该视频进行了分享。user_id，video_id，watch_label，is_share间采用逗号分隔。\n",
    "\n",
    "提交文件格式参考如下示例：\n",
    "\n",
    "| user_id | video_id | watch_label | is_share |\n",
    "| :------ | :------- | :---------- | :------- |\n",
    "| 1    | 1    | 6    | 0    |\n",
    "| 1    | 1    | 9    | 1    |\n",
    "| ...  | ...  | ...  | ...  |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-18T16:50:32.739502Z",
     "iopub.status.busy": "2021-07-18T16:50:32.738928Z",
     "iopub.status.idle": "2021-07-18T16:50:32.858053Z",
     "shell.execute_reply": "2021-07-18T16:50:32.856611Z",
     "shell.execute_reply.started": "2021-07-18T16:50:32.739454Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-rw-rw-r--  1  40M 2021-05-11 20:53:52.000000000 +0800 3/testdata/test.csv\n",
      "\n",
      "3/traindata/history_behavior_data:\n",
      "总用量 56K\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:32.000000000 +0800 20210419\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:36.000000000 +0800 20210420\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:40.000000000 +0800 20210421\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:44.000000000 +0800 20210422\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:46.000000000 +0800 20210423\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:54.000000000 +0800 20210424\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:09:58.000000000 +0800 20210425\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:02.000000000 +0800 20210426\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:06.000000000 +0800 20210427\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:10.000000000 +0800 20210428\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:12.000000000 +0800 20210429\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:18.000000000 +0800 20210430\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:22.000000000 +0800 20210501\n",
      "drwxrwxr-x 2 4.0K 2021-05-11 14:10:24.000000000 +0800 20210502\n",
      "\n",
      "3/traindata/user_features_data:\n",
      "总用量 139M\n",
      "-rw-rw-r-- 1 139M 2021-05-11 10:28:00.000000000 +0800 user_features_data.csv\n",
      "\n",
      "3/traindata/video_features_data:\n",
      "总用量 35M\n",
      "-rw-rw-r-- 1 35M 2021-05-11 10:27:42.000000000 +0800 video_features_data.csv\n"
     ]
    }
   ],
   "source": [
    "!ls 3/*/* --full-time -Ggh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-18T17:15:46.427165Z",
     "iopub.status.busy": "2021-07-18T17:15:46.426685Z",
     "iopub.status.idle": "2021-07-18T17:15:46.558558Z",
     "shell.execute_reply": "2021-07-18T17:15:46.557438Z",
     "shell.execute_reply.started": "2021-07-18T17:15:46.427125Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "part-00000-236b99d5-456a-42b2-bd8d-3cbd61d21cc6-c000.csv\n"
     ]
    }
   ],
   "source": [
    "!ls 3/traindata/history_behavior_data/20210419/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据读取"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-17T13:54:36.008907Z",
     "iopub.status.busy": "2021-08-17T13:54:36.008357Z",
     "iopub.status.idle": "2021-08-17T13:54:36.143037Z",
     "shell.execute_reply": "2021-08-17T13:54:36.141670Z",
     "shell.execute_reply.started": "2021-08-17T13:54:36.008861Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "50356 ./3/traindata/video_features_data/video_features_data.csv\n"
     ]
    }
   ],
   "source": [
    "!wc -l ./3/traindata/video_features_data/video_features_data.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-17T13:56:26.274138Z",
     "iopub.status.busy": "2021-08-17T13:56:26.273561Z",
     "iopub.status.idle": "2021-08-17T13:56:26.663158Z",
     "shell.execute_reply": "2021-08-17T13:56:26.662676Z",
     "shell.execute_reply.started": "2021-08-17T13:56:26.274089Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>video_id</th>\n",
       "      <th>video_name</th>\n",
       "      <th>video_tags</th>\n",
       "      <th>video_description</th>\n",
       "      <th>video_release_date</th>\n",
       "      <th>video_director_list</th>\n",
       "      <th>video_actor_list</th>\n",
       "      <th>video_score</th>\n",
       "      <th>video_second_class</th>\n",
       "      <th>video_duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3460</td>\n",
       "      <td>脱皮爸爸</td>\n",
       "      <td>院线电影,家庭关系,命运</td>\n",
       "      <td>中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...</td>\n",
       "      <td>2017-04-27</td>\n",
       "      <td>司徒慧焯</td>\n",
       "      <td>吴镇宇,古天乐,春夏,蔡洁</td>\n",
       "      <td>7.4</td>\n",
       "      <td>剧情,喜剧,奇幻</td>\n",
       "      <td>5913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>14553</td>\n",
       "      <td>喜气洋洋小金莲</td>\n",
       "      <td>古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷</td>\n",
       "      <td>故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>杨珊珊,李亚玲</td>\n",
       "      <td>陈南飞,程隆妮,王闯,贾海涛,闫薇儿</td>\n",
       "      <td>5.6</td>\n",
       "      <td>喜剧</td>\n",
       "      <td>6217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1214</td>\n",
       "      <td>风流家族</td>\n",
       "      <td>男女关系,家庭关系,命运,院线电影</td>\n",
       "      <td>香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...</td>\n",
       "      <td>2002-03-07</td>\n",
       "      <td>邱礼涛,杨漪珊</td>\n",
       "      <td>张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...</td>\n",
       "      <td>6.8</td>\n",
       "      <td>都市,喜剧,爱情,家庭</td>\n",
       "      <td>5963</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30639</td>\n",
       "      <td>大提琴的故事</td>\n",
       "      <td>短片,动画片</td>\n",
       "      <td>低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...</td>\n",
       "      <td>1949-01-01</td>\n",
       "      <td>伊里·特恩卡,契诃夫</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>动画,爱情</td>\n",
       "      <td>17371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>38522</td>\n",
       "      <td>歌舞大王齐格飞</td>\n",
       "      <td>喜剧片,人物传记,浪漫爱情</td>\n",
       "      <td>罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...</td>\n",
       "      <td>1936-04-08</td>\n",
       "      <td>罗伯特·Z·伦纳德,William Anthony McGuire</td>\n",
       "      <td>威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...</td>\n",
       "      <td>7.7</td>\n",
       "      <td>剧情,歌舞,喜剧</td>\n",
       "      <td>10608</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49726</th>\n",
       "      <td>36024</td>\n",
       "      <td>跆拳震九州岛</td>\n",
       "      <td>动作片,战争片</td>\n",
       "      <td>日本占领韩国时期，日军为镇压反抗志士，设特务机关，汉城“横山道馆”亦属其行列。跆拳道首徒金正...</td>\n",
       "      <td>1973-09-12</td>\n",
       "      <td>黄枫</td>\n",
       "      <td>茅瑛,黄家达,黄仁植,安德鲁·摩根,陈会毅,金琪珠</td>\n",
       "      <td>6.9</td>\n",
       "      <td>剧情,动作,战争</td>\n",
       "      <td>5736</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49727</th>\n",
       "      <td>11306</td>\n",
       "      <td>小豹杜玛</td>\n",
       "      <td>境外院线,冒险,家庭,人与动物,南非草原</td>\n",
       "      <td>广袤野性的南非草原。入夜，一只迷路的印度豹幼崽跌跌撞撞地闯进高速公路。幸运的是，彼得（坎贝尔...</td>\n",
       "      <td>2005-04-22</td>\n",
       "      <td>拉罗尔·巴尔兰德</td>\n",
       "      <td>Alex Michaeletos,坎贝尔·斯科特,霍普·戴维斯,伊默恩·沃克,Mary Ma...</td>\n",
       "      <td>8.9</td>\n",
       "      <td>剧情,冒险,家庭,儿童</td>\n",
       "      <td>6034</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49728</th>\n",
       "      <td>16178</td>\n",
       "      <td>父子</td>\n",
       "      <td>家庭关系,人际关系,命运</td>\n",
       "      <td>在马来西亚的一个华人社区里，伴随着一串年轻稚嫩的歌声，小男孩阿宝梦见爸爸骑着自行车载着他穿过...</td>\n",
       "      <td>2006-11-30</td>\n",
       "      <td>谭家明,田开良</td>\n",
       "      <td>郭富城,吴澋滔,杨采妮,林熙蕾</td>\n",
       "      <td>8.0</td>\n",
       "      <td>剧情,家庭</td>\n",
       "      <td>6918</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49729</th>\n",
       "      <td>5337</td>\n",
       "      <td>泪痕</td>\n",
       "      <td>院线电影,矛盾冲突,冲破万难,人性,人际关系,家庭关系,阴谋,挽救局面,恶势力</td>\n",
       "      <td>“四人帮”被打倒后，朱克实（李仁堂 饰）受上级指派，作为金县的县委书记走马上任。此前该县书记...</td>\n",
       "      <td>1979-01-01</td>\n",
       "      <td>李文化,马烽,孙谦</td>\n",
       "      <td>李仁堂,谢芳,杨威,邵万林,方辉,许福印,茂路,侯冠群</td>\n",
       "      <td>7.9</td>\n",
       "      <td>剧情</td>\n",
       "      <td>6885</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49730</th>\n",
       "      <td>11189</td>\n",
       "      <td>一哥</td>\n",
       "      <td>国庆档</td>\n",
       "      <td>史毕堡（万梓良 饰）是个绝世桥王，好友高达威（苗侨伟 饰）却是个有勇无谋的CID。高达威发现...</td>\n",
       "      <td>1987-10-01</td>\n",
       "      <td>范秀明,梁鸿华</td>\n",
       "      <td>万梓良,朱宝意,任达华,苗侨伟</td>\n",
       "      <td>7.1</td>\n",
       "      <td>动作,剧情</td>\n",
       "      <td>5175</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>49731 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       video_id video_name                               video_tags  \\\n",
       "0          3460       脱皮爸爸                             院线电影,家庭关系,命运   \n",
       "1         14553    喜气洋洋小金莲              古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷   \n",
       "2          1214       风流家族                        男女关系,家庭关系,命运,院线电影   \n",
       "3         30639     大提琴的故事                                   短片,动画片   \n",
       "4         38522    歌舞大王齐格飞                            喜剧片,人物传记,浪漫爱情   \n",
       "...         ...        ...                                      ...   \n",
       "49726     36024     跆拳震九州岛                                  动作片,战争片   \n",
       "49727     11306       小豹杜玛                     境外院线,冒险,家庭,人与动物,南非草原   \n",
       "49728     16178         父子                             家庭关系,人际关系,命运   \n",
       "49729      5337         泪痕  院线电影,矛盾冲突,冲破万难,人性,人际关系,家庭关系,阴谋,挽救局面,恶势力   \n",
       "49730     11189         一哥                                      国庆档   \n",
       "\n",
       "                                       video_description video_release_date  \\\n",
       "0      中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...         2017-04-27   \n",
       "1      故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...         2015-12-30   \n",
       "2      香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...         2002-03-07   \n",
       "3      低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...         1949-01-01   \n",
       "4      罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...         1936-04-08   \n",
       "...                                                  ...                ...   \n",
       "49726  日本占领韩国时期，日军为镇压反抗志士，设特务机关，汉城“横山道馆”亦属其行列。跆拳道首徒金正...         1973-09-12   \n",
       "49727  广袤野性的南非草原。入夜，一只迷路的印度豹幼崽跌跌撞撞地闯进高速公路。幸运的是，彼得（坎贝尔...         2005-04-22   \n",
       "49728  在马来西亚的一个华人社区里，伴随着一串年轻稚嫩的歌声，小男孩阿宝梦见爸爸骑着自行车载着他穿过...         2006-11-30   \n",
       "49729  “四人帮”被打倒后，朱克实（李仁堂 饰）受上级指派，作为金县的县委书记走马上任。此前该县书记...         1979-01-01   \n",
       "49730  史毕堡（万梓良 饰）是个绝世桥王，好友高达威（苗侨伟 饰）却是个有勇无谋的CID。高达威发现...         1987-10-01   \n",
       "\n",
       "                     video_director_list  \\\n",
       "0                                   司徒慧焯   \n",
       "1                                杨珊珊,李亚玲   \n",
       "2                                邱礼涛,杨漪珊   \n",
       "3                             伊里·特恩卡,契诃夫   \n",
       "4      罗伯特·Z·伦纳德,William Anthony McGuire   \n",
       "...                                  ...   \n",
       "49726                                 黄枫   \n",
       "49727                           拉罗尔·巴尔兰德   \n",
       "49728                            谭家明,田开良   \n",
       "49729                          李文化,马烽,孙谦   \n",
       "49730                            范秀明,梁鸿华   \n",
       "\n",
       "                                        video_actor_list  video_score  \\\n",
       "0                                          吴镇宇,古天乐,春夏,蔡洁          7.4   \n",
       "1                                     陈南飞,程隆妮,王闯,贾海涛,闫薇儿          5.6   \n",
       "2      张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...          6.8   \n",
       "3                                                    NaN          NaN   \n",
       "4      威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...          7.7   \n",
       "...                                                  ...          ...   \n",
       "49726                          茅瑛,黄家达,黄仁植,安德鲁·摩根,陈会毅,金琪珠          6.9   \n",
       "49727  Alex Michaeletos,坎贝尔·斯科特,霍普·戴维斯,伊默恩·沃克,Mary Ma...          8.9   \n",
       "49728                                    郭富城,吴澋滔,杨采妮,林熙蕾          8.0   \n",
       "49729                        李仁堂,谢芳,杨威,邵万林,方辉,许福印,茂路,侯冠群          7.9   \n",
       "49730                                    万梓良,朱宝意,任达华,苗侨伟          7.1   \n",
       "\n",
       "      video_second_class  video_duration  \n",
       "0               剧情,喜剧,奇幻            5913  \n",
       "1                     喜剧            6217  \n",
       "2            都市,喜剧,爱情,家庭            5963  \n",
       "3                  动画,爱情           17371  \n",
       "4               剧情,歌舞,喜剧           10608  \n",
       "...                  ...             ...  \n",
       "49726           剧情,动作,战争            5736  \n",
       "49727        剧情,冒险,家庭,儿童            6034  \n",
       "49728              剧情,家庭            6918  \n",
       "49729                 剧情            6885  \n",
       "49730              动作,剧情            5175  \n",
       "\n",
       "[49731 rows x 10 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "INPUT_PATH = './3'\n",
    "video_features = pd.read_csv(f'{INPUT_PATH}/traindata/video_features_data/video_features_data.csv', sep='\\t')\n",
    "video_features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-17T14:04:45.191526Z",
     "iopub.status.busy": "2021-08-17T14:04:45.190946Z",
     "iopub.status.idle": "2021-08-17T14:04:45.472297Z",
     "shell.execute_reply": "2021-08-17T14:04:45.471740Z",
     "shell.execute_reply.started": "2021-08-17T14:04:45.191479Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import codecs\n",
    "import numpy as np\n",
    "lines = codecs.open('./3/traindata/video_features_data/video_features_data.csv').readlines()\n",
    "df = pd.DataFrame([x.strip().split('\\t') for x in lines[1:] if len(x.split('\\t')) == 10])\n",
    "df.columns = lines[0].strip().split('\\t')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-17T14:05:05.519157Z",
     "iopub.status.busy": "2021-08-17T14:05:05.518595Z",
     "iopub.status.idle": "2021-08-17T14:05:05.536700Z",
     "shell.execute_reply": "2021-08-17T14:05:05.536062Z",
     "shell.execute_reply.started": "2021-08-17T14:05:05.519122Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>video_id</th>\n",
       "      <th>video_name</th>\n",
       "      <th>video_tags</th>\n",
       "      <th>video_description</th>\n",
       "      <th>video_release_date</th>\n",
       "      <th>video_director_list</th>\n",
       "      <th>video_actor_list</th>\n",
       "      <th>video_score</th>\n",
       "      <th>video_second_class</th>\n",
       "      <th>video_duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3460</td>\n",
       "      <td>脱皮爸爸</td>\n",
       "      <td>院线电影,家庭关系,命运</td>\n",
       "      <td>中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...</td>\n",
       "      <td>2017-04-27</td>\n",
       "      <td>司徒慧焯</td>\n",
       "      <td>吴镇宇,古天乐,春夏,蔡洁</td>\n",
       "      <td>7.4</td>\n",
       "      <td>剧情,喜剧,奇幻</td>\n",
       "      <td>5913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>14553</td>\n",
       "      <td>喜气洋洋小金莲</td>\n",
       "      <td>古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷</td>\n",
       "      <td>故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>杨珊珊,李亚玲</td>\n",
       "      <td>陈南飞,程隆妮,王闯,贾海涛,闫薇儿</td>\n",
       "      <td>5.6</td>\n",
       "      <td>喜剧</td>\n",
       "      <td>6217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1214</td>\n",
       "      <td>风流家族</td>\n",
       "      <td>男女关系,家庭关系,命运,院线电影</td>\n",
       "      <td>香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...</td>\n",
       "      <td>2002-03-07</td>\n",
       "      <td>邱礼涛,杨漪珊</td>\n",
       "      <td>张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...</td>\n",
       "      <td>6.8</td>\n",
       "      <td>都市,喜剧,爱情,家庭</td>\n",
       "      <td>5963</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30639</td>\n",
       "      <td>大提琴的故事</td>\n",
       "      <td>短片,动画片</td>\n",
       "      <td>低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...</td>\n",
       "      <td>1949-01-01</td>\n",
       "      <td>伊里·特恩卡,契诃夫</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>动画,爱情</td>\n",
       "      <td>17371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>38522</td>\n",
       "      <td>歌舞大王齐格飞</td>\n",
       "      <td>喜剧片,人物传记,浪漫爱情</td>\n",
       "      <td>罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...</td>\n",
       "      <td>1936-04-08</td>\n",
       "      <td>罗伯特·Z·伦纳德,William Anthony McGuire</td>\n",
       "      <td>威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...</td>\n",
       "      <td>7.7</td>\n",
       "      <td>剧情,歌舞,喜剧</td>\n",
       "      <td>10608</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50339</th>\n",
       "      <td>36024</td>\n",
       "      <td>跆拳震九州岛</td>\n",
       "      <td>动作片,战争片</td>\n",
       "      <td>日本占领韩国时期，日军为镇压反抗志士，设特务机关，汉城“横山道馆”亦属其行列。跆拳道首徒金正...</td>\n",
       "      <td>1973-09-12</td>\n",
       "      <td>黄枫</td>\n",
       "      <td>茅瑛,黄家达,黄仁植,安德鲁·摩根,陈会毅,金琪珠</td>\n",
       "      <td>6.9</td>\n",
       "      <td>剧情,动作,战争</td>\n",
       "      <td>5736</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50340</th>\n",
       "      <td>11306</td>\n",
       "      <td>小豹杜玛</td>\n",
       "      <td>境外院线,冒险,家庭,人与动物,南非草原</td>\n",
       "      <td>广袤野性的南非草原。入夜，一只迷路的印度豹幼崽跌跌撞撞地闯进高速公路。幸运的是，彼得（坎贝尔...</td>\n",
       "      <td>2005-04-22</td>\n",
       "      <td>拉罗尔·巴尔兰德</td>\n",
       "      <td>Alex Michaeletos,坎贝尔·斯科特,霍普·戴维斯,伊默恩·沃克,Mary Ma...</td>\n",
       "      <td>8.9</td>\n",
       "      <td>剧情,冒险,家庭,儿童</td>\n",
       "      <td>6034</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50341</th>\n",
       "      <td>16178</td>\n",
       "      <td>父子</td>\n",
       "      <td>家庭关系,人际关系,命运</td>\n",
       "      <td>在马来西亚的一个华人社区里，伴随着一串年轻稚嫩的歌声，小男孩阿宝梦见爸爸骑着自行车载着他穿过...</td>\n",
       "      <td>2006-11-30</td>\n",
       "      <td>谭家明,田开良</td>\n",
       "      <td>郭富城,吴澋滔,杨采妮,林熙蕾</td>\n",
       "      <td>8.0</td>\n",
       "      <td>剧情,家庭</td>\n",
       "      <td>6918</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50342</th>\n",
       "      <td>5337</td>\n",
       "      <td>泪痕</td>\n",
       "      <td>院线电影,矛盾冲突,冲破万难,人性,人际关系,家庭关系,阴谋,挽救局面,恶势力</td>\n",
       "      <td>“四人帮”被打倒后，朱克实（李仁堂 饰）受上级指派，作为金县的县委书记走马上任。此前该县书记...</td>\n",
       "      <td>1979-01-01</td>\n",
       "      <td>李文化,马烽,孙谦</td>\n",
       "      <td>李仁堂,谢芳,杨威,邵万林,方辉,许福印,茂路,侯冠群</td>\n",
       "      <td>7.9</td>\n",
       "      <td>剧情</td>\n",
       "      <td>6885</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50343</th>\n",
       "      <td>11189</td>\n",
       "      <td>一哥</td>\n",
       "      <td>国庆档</td>\n",
       "      <td>史毕堡（万梓良 饰）是个绝世桥王，好友高达威（苗侨伟 饰）却是个有勇无谋的CID。高达威发现...</td>\n",
       "      <td>1987-10-01</td>\n",
       "      <td>范秀明,梁鸿华</td>\n",
       "      <td>万梓良,朱宝意,任达华,苗侨伟</td>\n",
       "      <td>7.1</td>\n",
       "      <td>动作,剧情</td>\n",
       "      <td>5175</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>50344 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      video_id video_name                               video_tags  \\\n",
       "0         3460       脱皮爸爸                             院线电影,家庭关系,命运   \n",
       "1        14553    喜气洋洋小金莲              古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷   \n",
       "2         1214       风流家族                        男女关系,家庭关系,命运,院线电影   \n",
       "3        30639     大提琴的故事                                   短片,动画片   \n",
       "4        38522    歌舞大王齐格飞                            喜剧片,人物传记,浪漫爱情   \n",
       "...        ...        ...                                      ...   \n",
       "50339    36024     跆拳震九州岛                                  动作片,战争片   \n",
       "50340    11306       小豹杜玛                     境外院线,冒险,家庭,人与动物,南非草原   \n",
       "50341    16178         父子                             家庭关系,人际关系,命运   \n",
       "50342     5337         泪痕  院线电影,矛盾冲突,冲破万难,人性,人际关系,家庭关系,阴谋,挽救局面,恶势力   \n",
       "50343    11189         一哥                                      国庆档   \n",
       "\n",
       "                                       video_description video_release_date  \\\n",
       "0      中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...         2017-04-27   \n",
       "1      故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...         2015-12-30   \n",
       "2      香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...         2002-03-07   \n",
       "3      低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...         1949-01-01   \n",
       "4      罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...         1936-04-08   \n",
       "...                                                  ...                ...   \n",
       "50339  日本占领韩国时期，日军为镇压反抗志士，设特务机关，汉城“横山道馆”亦属其行列。跆拳道首徒金正...         1973-09-12   \n",
       "50340  广袤野性的南非草原。入夜，一只迷路的印度豹幼崽跌跌撞撞地闯进高速公路。幸运的是，彼得（坎贝尔...         2005-04-22   \n",
       "50341  在马来西亚的一个华人社区里，伴随着一串年轻稚嫩的歌声，小男孩阿宝梦见爸爸骑着自行车载着他穿过...         2006-11-30   \n",
       "50342  “四人帮”被打倒后，朱克实（李仁堂 饰）受上级指派，作为金县的县委书记走马上任。此前该县书记...         1979-01-01   \n",
       "50343  史毕堡（万梓良 饰）是个绝世桥王，好友高达威（苗侨伟 饰）却是个有勇无谋的CID。高达威发现...         1987-10-01   \n",
       "\n",
       "                     video_director_list  \\\n",
       "0                                   司徒慧焯   \n",
       "1                                杨珊珊,李亚玲   \n",
       "2                                邱礼涛,杨漪珊   \n",
       "3                             伊里·特恩卡,契诃夫   \n",
       "4      罗伯特·Z·伦纳德,William Anthony McGuire   \n",
       "...                                  ...   \n",
       "50339                                 黄枫   \n",
       "50340                           拉罗尔·巴尔兰德   \n",
       "50341                            谭家明,田开良   \n",
       "50342                          李文化,马烽,孙谦   \n",
       "50343                            范秀明,梁鸿华   \n",
       "\n",
       "                                        video_actor_list video_score  \\\n",
       "0                                          吴镇宇,古天乐,春夏,蔡洁         7.4   \n",
       "1                                     陈南飞,程隆妮,王闯,贾海涛,闫薇儿         5.6   \n",
       "2      张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...         6.8   \n",
       "3                                                                      \n",
       "4      威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...         7.7   \n",
       "...                                                  ...         ...   \n",
       "50339                          茅瑛,黄家达,黄仁植,安德鲁·摩根,陈会毅,金琪珠         6.9   \n",
       "50340  Alex Michaeletos,坎贝尔·斯科特,霍普·戴维斯,伊默恩·沃克,Mary Ma...         8.9   \n",
       "50341                                    郭富城,吴澋滔,杨采妮,林熙蕾         8.0   \n",
       "50342                        李仁堂,谢芳,杨威,邵万林,方辉,许福印,茂路,侯冠群         7.9   \n",
       "50343                                    万梓良,朱宝意,任达华,苗侨伟         7.1   \n",
       "\n",
       "      video_second_class video_duration  \n",
       "0               剧情,喜剧,奇幻           5913  \n",
       "1                     喜剧           6217  \n",
       "2            都市,喜剧,爱情,家庭           5963  \n",
       "3                  动画,爱情          17371  \n",
       "4               剧情,歌舞,喜剧          10608  \n",
       "...                  ...            ...  \n",
       "50339           剧情,动作,战争           5736  \n",
       "50340        剧情,冒险,家庭,儿童           6034  \n",
       "50341              剧情,家庭           6918  \n",
       "50342                 剧情           6885  \n",
       "50343              动作,剧情           5175  \n",
       "\n",
       "[50344 rows x 10 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-17T14:05:16.168448Z",
     "iopub.status.busy": "2021-08-17T14:05:16.167850Z",
     "iopub.status.idle": "2021-08-17T14:05:16.219765Z",
     "shell.execute_reply": "2021-08-17T14:05:16.219176Z",
     "shell.execute_reply.started": "2021-08-17T14:05:16.168402Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>15</th>\n",
       "      <th>16</th>\n",
       "      <th>17</th>\n",
       "      <th>18</th>\n",
       "      <th>19</th>\n",
       "      <th>20</th>\n",
       "      <th>21</th>\n",
       "      <th>22</th>\n",
       "      <th>23</th>\n",
       "      <th>24</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>28701</td>\n",
       "      <td>看不见的月亮</td>\n",
       "      <td>剧情片</td>\n",
       "      <td>\"《看不见的月亮》由José Pepe Bojórquez</td>\n",
       "      <td>执导。韦斯·本特利、安娜·莎若狄拉、乔纳森·斯卡奇、赫克特·吉门雷兹主演。父亲去世后，Vic...</td>\n",
       "      <td>2012-11-23</td>\n",
       "      <td>José Pepe Bojórquez</td>\n",
       "      <td>安娜·莎若狄拉,赫克特·吉门雷兹,安赫丽卡·玛丽亚</td>\n",
       "      <td>7.1</td>\n",
       "      <td>悬疑,剧情</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>37918</td>\n",
       "      <td>科学怪人</td>\n",
       "      <td>科幻片,惊悚片</td>\n",
       "      <td>\"以前正大综艺的正大剧场放过的经典影片.电影里造人的方法不同于小说原著，故事发生在上世纪初主...</td>\n",
       "      <td>克隆人要科学家再为他克隆一个属于自己的女人,但没有成功,克隆女人未成型的血肉从水龙头中“流产”.\"</td>\n",
       "      <td>1992-12-29</td>\n",
       "      <td>玛丽·雪莱</td>\n",
       "      <td>兰迪·奎德,朗贝尔·维尔森,约翰·米尔斯,弗农·多布切夫,帕特里克·博金,Ronald Le...</td>\n",
       "      <td>8.3</td>\n",
       "      <td>恐怖,剧情,科幻</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>26565</td>\n",
       "      <td>轻小说的快乐写作法</td>\n",
       "      <td></td>\n",
       "      <td>\"只对海洋生物感兴趣的高中生与八云被出版社工作的表姐心夏差遣，去人气轻小说作家姬宫美樱家里拿...</td>\n",
       "      <td></td>\n",
       "      <td>然而剑如今正因为没有灵感而烦恼，要写的新作学园爱情喜剧丝毫没有进展。剑说因为自...</td>\n",
       "      <td></td>\n",
       "      <td>八云就此和剑展开了恋爱模拟游戏。不料，“学园最萌美少女”市古优奈也插了进来。这...</td>\n",
       "      <td>2010-01-01</td>\n",
       "      <td>大森研一</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3046</td>\n",
       "      <td>柔嫩的脸颊</td>\n",
       "      <td></td>\n",
       "      <td>\"ハイビジョン国際映像祭2001ドラマ部門フェスティバル表彰作品。制作費１億円というドラマと...</td>\n",
       "      <td>BS-i放送曜日</td>\n",
       "      <td>火水放送期間</td>\n",
       "      <td>2001/01/02～2001/01/03放送時間</td>\n",
       "      <td>21:00-23:00放送回数</td>\n",
       "      <td>2 回連続/単発</td>\n",
       "      <td>単発原作</td>\n",
       "      <td>...</td>\n",
       "      <td>（音響効果・帆苅　幸雄）撮影技術</td>\n",
       "      <td>本田　　茂、（照明・高坂　俊秀）（録音・山田　　均）（編集・長崎　俊一、雲　　丹）（オンライ...</td>\n",
       "      <td>VT：東宝ビデオ、DVD：パイオニアLDC美術</td>\n",
       "      <td>金田　克美、（装飾・柴田　博英）（大道具・齋藤　和弘、木村　浩之）（衣裳・中山　邦夫、野中　...</td>\n",
       "      <td>2001-01-01</td>\n",
       "      <td>长崎俊一</td>\n",
       "      <td>天海祐希,三浦友和,松冈俊介,渡边一计</td>\n",
       "      <td>8.4</td>\n",
       "      <td>预告片</td>\n",
       "      <td>31\\n</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>48142</td>\n",
       "      <td>急救爱情狂</td>\n",
       "      <td>犯罪片,爱情片,惊悚片</td>\n",
       "      <td>\"贝蒂是堪萨斯州一个小镇上的餐厅女招待，有一个满身陋习的丈夫德尔，她同时还是一个忠实的肥皂剧...</td>\n",
       "      <td></td>\n",
       "      <td>德尔在一次毒品交易中被杀害，目击此事的贝蒂决定开始自己的新生活，她独自一人驾车去洛杉...</td>\n",
       "      <td>2000-09-08</td>\n",
       "      <td>尼尔·拉布特,James Flamberg</td>\n",
       "      <td>摩根·弗里曼,芮妮·齐薇格,克里斯·洛克,格雷戈·金尼尔,艾伦·艾克哈特,克利斯丁·格拉夫,...</td>\n",
       "      <td>7.8</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>21140</td>\n",
       "      <td>《富春山居图》纪录片</td>\n",
       "      <td></td>\n",
       "      <td>\"中国元代名画《富春山居图》合璧展在即，国际黑市开出天价，日本黑帮、英伦大盗闻风而动。陷于不...</td>\n",
       "      <td>肖锦汉为夺画上天入地钻沙漠，能否力证清白，重现昔日特工风采？林雨嫣博弈西子湖畔，能否与肖锦汉...</td>\n",
       "      <td>2013-01-01</td>\n",
       "      <td>孙健君,鲍勃·布朗</td>\n",
       "      <td>刘德华,林志玲,佟大为,张静初,斯琴高娃,王曼妮</td>\n",
       "      <td>6.1</td>\n",
       "      <td>动作,冒险,犯罪</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>23757</td>\n",
       "      <td>豹女之夺命之旅</td>\n",
       "      <td>动作片,犯罪片</td>\n",
       "      <td>\"关正康(张耀扬)古惑仔一名，女朋友与他社团大佬有染，情绪郁涩，决定向大佬报复，及后逃往泰国...</td>\n",
       "      <td></td>\n",
       "      <td>另一方面，泰国将军猜逢亦对晶片虎视眈眈，猜逢向康等展开连翻追杀，康与SA...</td>\n",
       "      <td>2001-12-20</td>\n",
       "      <td>罗舜泉,欧罗,陈志恒,徐达初</td>\n",
       "      <td>张耀扬,柯受良,吴毅将,太保,黄佩霞</td>\n",
       "      <td>7.2</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>26279</td>\n",
       "      <td>the ten commandments number 5: thou shalt not ...</td>\n",
       "      <td></td>\n",
       "      <td>\"菲尔莫洛伊极端动画选集</td>\n",
       "      <td>Phil Mulloy - Extreme Animation\"</td>\n",
       "      <td>1996-01-01</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>7.9</td>\n",
       "      <td>动画,喜剧,短片,预告片</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>32810</td>\n",
       "      <td>the ten commandments number 3: remember to kee...</td>\n",
       "      <td></td>\n",
       "      <td>\"菲尔莫洛伊极端动画选集</td>\n",
       "      <td>Phil Mulloy - Extreme Animation\"</td>\n",
       "      <td>1995-01-01</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>7.8</td>\n",
       "      <td>动画,喜剧,短片,预告片</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>20900</td>\n",
       "      <td>怪笑少女</td>\n",
       "      <td>男女关系,命运</td>\n",
       "      <td>\"大分裂时期，各国纷争，狼烟四起。大魏国为了抢得军粮，频频来犯大齐国。女主人公小小苏刚一出生...</td>\n",
       "      <td>小小苏踏上了寻找胎记的旅途，但殊不知，一场惊天的阴谋也随之降临……\"</td>\n",
       "      <td>2017-08-04</td>\n",
       "      <td>徐洋</td>\n",
       "      <td>董慧,昌隆,王劲松,盛喆</td>\n",
       "      <td>7.3</td>\n",
       "      <td>古装,喜剧,奇幻</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>18521</td>\n",
       "      <td>322档案</td>\n",
       "      <td></td>\n",
       "      <td>\"Mannheim-Heidelberg International Filmfestiva...</td>\n",
       "      <td>Result</td>\n",
       "      <td>Award</td>\n",
       "      <td>Category/Recipient(s)1969</td>\n",
       "      <td>Won</td>\n",
       "      <td>Grand PrizeDusan HanákA government official in...</td>\n",
       "      <td>1969-01-01</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>11 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       0                                                  1            2   \\\n",
       "0   28701                                             看不见的月亮          剧情片   \n",
       "1   37918                                               科学怪人      科幻片,惊悚片   \n",
       "2   26565                                          轻小说的快乐写作法                \n",
       "3    3046                                              柔嫩的脸颊                \n",
       "4   48142                                              急救爱情狂  犯罪片,爱情片,惊悚片   \n",
       "5   21140                                         《富春山居图》纪录片                \n",
       "6   23757                                            豹女之夺命之旅      动作片,犯罪片   \n",
       "7   26279  the ten commandments number 5: thou shalt not ...                \n",
       "8   32810  the ten commandments number 3: remember to kee...                \n",
       "9   20900                                               怪笑少女      男女关系,命运   \n",
       "10  18521                                              322档案                \n",
       "\n",
       "                                                   3   \\\n",
       "0                       \"《看不见的月亮》由José Pepe Bojórquez   \n",
       "1   \"以前正大综艺的正大剧场放过的经典影片.电影里造人的方法不同于小说原著，故事发生在上世纪初主...   \n",
       "2   \"只对海洋生物感兴趣的高中生与八云被出版社工作的表姐心夏差遣，去人气轻小说作家姬宫美樱家里拿...   \n",
       "3   \"ハイビジョン国際映像祭2001ドラマ部門フェスティバル表彰作品。制作費１億円というドラマと...   \n",
       "4   \"贝蒂是堪萨斯州一个小镇上的餐厅女招待，有一个满身陋习的丈夫德尔，她同时还是一个忠实的肥皂剧...   \n",
       "5   \"中国元代名画《富春山居图》合璧展在即，国际黑市开出天价，日本黑帮、英伦大盗闻风而动。陷于不...   \n",
       "6   \"关正康(张耀扬)古惑仔一名，女朋友与他社团大佬有染，情绪郁涩，决定向大佬报复，及后逃往泰国...   \n",
       "7                                        \"菲尔莫洛伊极端动画选集   \n",
       "8                                        \"菲尔莫洛伊极端动画选集   \n",
       "9   \"大分裂时期，各国纷争，狼烟四起。大魏国为了抢得军粮，频频来犯大齐国。女主人公小小苏刚一出生...   \n",
       "10  \"Mannheim-Heidelberg International Filmfestiva...   \n",
       "\n",
       "                                                   4   \\\n",
       "0   执导。韦斯·本特利、安娜·莎若狄拉、乔纳森·斯卡奇、赫克特·吉门雷兹主演。父亲去世后，Vic...   \n",
       "1   克隆人要科学家再为他克隆一个属于自己的女人,但没有成功,克隆女人未成型的血肉从水龙头中“流产”.\"   \n",
       "2                                                       \n",
       "3                                            BS-i放送曜日   \n",
       "4                                                       \n",
       "5   肖锦汉为夺画上天入地钻沙漠，能否力证清白，重现昔日特工风采？林雨嫣博弈西子湖畔，能否与肖锦汉...   \n",
       "6                                                       \n",
       "7                    Phil Mulloy - Extreme Animation\"   \n",
       "8                    Phil Mulloy - Extreme Animation\"   \n",
       "9                  小小苏踏上了寻找胎记的旅途，但殊不知，一场惊天的阴谋也随之降临……\"   \n",
       "10                                             Result   \n",
       "\n",
       "                                                   5   \\\n",
       "0                                          2012-11-23   \n",
       "1                                          1992-12-29   \n",
       "2          然而剑如今正因为没有灵感而烦恼，要写的新作学园爱情喜剧丝毫没有进展。剑说因为自...   \n",
       "3                                              火水放送期間   \n",
       "4   　　　　德尔在一次毒品交易中被杀害，目击此事的贝蒂决定开始自己的新生活，她独自一人驾车去洛杉...   \n",
       "5                                          2013-01-01   \n",
       "6   　　        另一方面，泰国将军猜逢亦对晶片虎视眈眈，猜逢向康等展开连翻追杀，康与SA...   \n",
       "7                                          1996-01-01   \n",
       "8                                          1995-01-01   \n",
       "9                                          2017-08-04   \n",
       "10                                              Award   \n",
       "\n",
       "                            6   \\\n",
       "0          José Pepe Bojórquez   \n",
       "1                        玛丽·雪莱   \n",
       "2                                \n",
       "3    2001/01/02～2001/01/03放送時間   \n",
       "4                   2000-09-08   \n",
       "5                    孙健君,鲍勃·布朗   \n",
       "6                   2001-12-20   \n",
       "7                                \n",
       "8                                \n",
       "9                           徐洋   \n",
       "10  Category/Recipient(s)1969    \n",
       "\n",
       "                                                   7   \\\n",
       "0                           安娜·莎若狄拉,赫克特·吉门雷兹,安赫丽卡·玛丽亚   \n",
       "1   兰迪·奎德,朗贝尔·维尔森,约翰·米尔斯,弗农·多布切夫,帕特里克·博金,Ronald Le...   \n",
       "2          八云就此和剑展开了恋爱模拟游戏。不料，“学园最萌美少女”市古优奈也插了进来。这...   \n",
       "3                                     21:00-23:00放送回数   \n",
       "4                               尼尔·拉布特,James Flamberg   \n",
       "5                            刘德华,林志玲,佟大为,张静初,斯琴高娃,王曼妮   \n",
       "6                                      罗舜泉,欧罗,陈志恒,徐达初   \n",
       "7                                                       \n",
       "8                                                       \n",
       "9                                        董慧,昌隆,王劲松,盛喆   \n",
       "10                                               Won    \n",
       "\n",
       "                                                   8             9   ...  \\\n",
       "0                                                 7.1         悬疑,剧情  ...   \n",
       "1                                                 8.3      恐怖,剧情,科幻  ...   \n",
       "2                                          2010-01-01          大森研一  ...   \n",
       "3                                            2 回連続/単発          単発原作  ...   \n",
       "4   摩根·弗里曼,芮妮·齐薇格,克里斯·洛克,格雷戈·金尼尔,艾伦·艾克哈特,克利斯丁·格拉夫,...           7.8  ...   \n",
       "5                                                 6.1      动作,冒险,犯罪  ...   \n",
       "6                                  张耀扬,柯受良,吴毅将,太保,黄佩霞           7.2  ...   \n",
       "7                                                 7.9  动画,喜剧,短片,预告片  ...   \n",
       "8                                                 7.8  动画,喜剧,短片,预告片  ...   \n",
       "9                                                 7.3      古装,喜剧,奇幻  ...   \n",
       "10  Grand PrizeDusan HanákA government official in...    1969-01-01  ...   \n",
       "\n",
       "                  15                                                 16  \\\n",
       "0               None                                               None   \n",
       "1               None                                               None   \n",
       "2               None                                               None   \n",
       "3   （音響効果・帆苅　幸雄）撮影技術  本田　　茂、（照明・高坂　俊秀）（録音・山田　　均）（編集・長崎　俊一、雲　　丹）（オンライ...   \n",
       "4               None                                               None   \n",
       "5               None                                               None   \n",
       "6               None                                               None   \n",
       "7               None                                               None   \n",
       "8               None                                               None   \n",
       "9               None                                               None   \n",
       "10              None                                               None   \n",
       "\n",
       "                         17  \\\n",
       "0                      None   \n",
       "1                      None   \n",
       "2                      None   \n",
       "3   VT：東宝ビデオ、DVD：パイオニアLDC美術   \n",
       "4                      None   \n",
       "5                      None   \n",
       "6                      None   \n",
       "7                      None   \n",
       "8                      None   \n",
       "9                      None   \n",
       "10                     None   \n",
       "\n",
       "                                                   18          19    20  \\\n",
       "0                                                None        None  None   \n",
       "1                                                None        None  None   \n",
       "2                                                None        None  None   \n",
       "3   金田　克美、（装飾・柴田　博英）（大道具・齋藤　和弘、木村　浩之）（衣裳・中山　邦夫、野中　...  2001-01-01  长崎俊一   \n",
       "4                                                None        None  None   \n",
       "5                                                None        None  None   \n",
       "6                                                None        None  None   \n",
       "7                                                None        None  None   \n",
       "8                                                None        None  None   \n",
       "9                                                None        None  None   \n",
       "10                                               None        None  None   \n",
       "\n",
       "                     21    22    23    24  \n",
       "0                  None  None  None  None  \n",
       "1                  None  None  None  None  \n",
       "2                  None  None  None  None  \n",
       "3   天海祐希,三浦友和,松冈俊介,渡边一计   8.4   预告片  31\\n  \n",
       "4                  None  None  None  None  \n",
       "5                  None  None  None  None  \n",
       "6                  None  None  None  None  \n",
       "7                  None  None  None  None  \n",
       "8                  None  None  None  None  \n",
       "9                  None  None  None  None  \n",
       "10                 None  None  None  None  \n",
       "\n",
       "[11 rows x 25 columns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame([x.split('\\t') for x in lines if len(x.split('\\t')) != 10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:43:44.390617Z",
     "iopub.status.busy": "2021-08-15T18:43:44.390145Z",
     "iopub.status.idle": "2021-08-15T18:54:20.284331Z",
     "shell.execute_reply": "2021-08-15T18:54:20.283788Z",
     "shell.execute_reply.started": "2021-08-15T18:43:44.390584Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/lyz/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3263: DtypeWarning: Columns (6) have mixed types.Specify dtype option on import or set low_memory=False.\n",
      "  if (await self.run_code(code, result,  async_=asy)):\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "378.43 Mb, 131.40 Mb (65.28 %)\n",
      "405.14 Mb, 140.67 Mb (65.28 %)\n",
      "437.46 Mb, 151.90 Mb (65.28 %)\n",
      "374.51 Mb, 130.04 Mb (65.28 %)\n",
      "387.82 Mb, 134.66 Mb (65.28 %)\n",
      "372.54 Mb, 129.35 Mb (65.28 %)\n",
      "383.36 Mb, 133.11 Mb (65.28 %)\n",
      "397.30 Mb, 137.95 Mb (65.28 %)\n",
      "444.39 Mb, 154.30 Mb (65.28 %)\n",
      "347.10 Mb, 120.52 Mb (65.28 %)\n",
      "368.09 Mb, 127.81 Mb (65.28 %)\n",
      "448.31 Mb, 155.66 Mb (65.28 %)\n",
      "388.03 Mb, 134.73 Mb (65.28 %)\n",
      "379.69 Mb, 131.84 Mb (65.28 %)\n",
      "43.06 Mb, 21.53 Mb (50.00 %)\n",
      "360.77 Mb, 73.28 Mb (79.69 %)\n",
      "3.79 Mb, 3.04 Mb (20.00 %)\n",
      "2526.41 Mb, 2526.41 Mb (0.00 %)\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import glob, gc\n",
    "\n",
    "%pylab inline\n",
    "import seaborn as sns\n",
    "\n",
    "INPUT_PATH = './3'\n",
    "\n",
    "def reduce_mem(df):\n",
    "    start_mem = df.memory_usage().sum() / 1024 ** 2\n",
    "    for col in df.columns:\n",
    "        col_type = df[col].dtypes\n",
    "        if col_type != object:\n",
    "            c_min = df[col].min()\n",
    "            c_max = df[col].max()\n",
    "            if str(col_type)[:3] == 'int':\n",
    "                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:\n",
    "                    df[col] = df[col].astype(np.int8)\n",
    "                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:\n",
    "                    df[col] = df[col].astype(np.int16)\n",
    "                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:\n",
    "                    df[col] = df[col].astype(np.int32)\n",
    "                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:\n",
    "                    df[col] = df[col].astype(np.int64)\n",
    "            else:\n",
    "                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:\n",
    "                    df[col] = df[col].astype(np.float16)\n",
    "                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:\n",
    "                    df[col] = df[col].astype(np.float32)\n",
    "                else:\n",
    "                    df[col] = df[col].astype(np.float64)\n",
    "    end_mem = df.memory_usage().sum() / 1024 ** 2\n",
    "    print('{:.2f} Mb, {:.2f} Mb ({:.2f} %)'.format(start_mem, end_mem, 100 * (start_mem - end_mem) / start_mem))\n",
    "    gc.collect()\n",
    "    return df\n",
    "\n",
    "test_data = pd.read_csv(f'{INPUT_PATH}/testdata/test.csv', sep=',')\n",
    "user_features = pd.read_csv(f'{INPUT_PATH}/traindata/user_features_data/user_features_data.csv', sep='\\t')\n",
    "video_features = pd.read_csv(f'{INPUT_PATH}/traindata/video_features_data/video_features_data.csv', sep='\\t')\n",
    "history_behavior = pd.concat([\n",
    "    reduce_mem(pd.read_csv(x, sep='\\t')) for x in glob.glob(f'{INPUT_PATH}/traindata/history_behavior_data/*/*')\n",
    "])\n",
    "history_behavior = history_behavior.sort_values(by=['pt_d', 'user_id'])\n",
    "\n",
    "test_data = reduce_mem(test_data)\n",
    "user_features = reduce_mem(user_features)\n",
    "video_features = reduce_mem(video_features)\n",
    "history_behavior = reduce_mem(history_behavior)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:54:47.352634Z",
     "iopub.status.busy": "2021-08-15T18:54:47.352125Z",
     "iopub.status.idle": "2021-08-15T18:54:47.367509Z",
     "shell.execute_reply": "2021-08-15T18:54:47.366883Z",
     "shell.execute_reply.started": "2021-08-15T18:54:47.352596Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>country</th>\n",
       "      <th>province</th>\n",
       "      <th>city</th>\n",
       "      <th>city_level</th>\n",
       "      <th>device_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1757005</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>17938</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>22</td>\n",
       "      <td>3</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4263520</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1411600</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>138</td>\n",
       "      <td>1</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3992242</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>142</td>\n",
       "      <td>0</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id  age  gender  country  province  city  city_level  device_name\n",
       "0  1757005    3       1        0         9     6           3          327\n",
       "1    17938    0       0        0         4    22           3          327\n",
       "2  4263520    1       0        0        19     1           5          327\n",
       "3  1411600    3       0        0         5   138           1          327\n",
       "4  3992242    2       0        0         0   142           0          327"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_features.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:54:47.792795Z",
     "iopub.status.busy": "2021-08-15T18:54:47.792240Z",
     "iopub.status.idle": "2021-08-15T18:54:48.331996Z",
     "shell.execute_reply": "2021-08-15T18:54:48.331377Z",
     "shell.execute_reply.started": "2021-08-15T18:54:47.792748Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>video_id</th>\n",
       "      <th>video_name</th>\n",
       "      <th>video_tags</th>\n",
       "      <th>video_description</th>\n",
       "      <th>video_release_date</th>\n",
       "      <th>video_director_list</th>\n",
       "      <th>video_actor_list</th>\n",
       "      <th>video_score</th>\n",
       "      <th>video_second_class</th>\n",
       "      <th>video_duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3460</td>\n",
       "      <td>脱皮爸爸</td>\n",
       "      <td>院线电影,家庭关系,命运</td>\n",
       "      <td>中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...</td>\n",
       "      <td>2017-04-27</td>\n",
       "      <td>司徒慧焯</td>\n",
       "      <td>吴镇宇,古天乐,春夏,蔡洁</td>\n",
       "      <td>7.398438</td>\n",
       "      <td>剧情,喜剧,奇幻</td>\n",
       "      <td>5913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>14553</td>\n",
       "      <td>喜气洋洋小金莲</td>\n",
       "      <td>古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷</td>\n",
       "      <td>故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>杨珊珊,李亚玲</td>\n",
       "      <td>陈南飞,程隆妮,王闯,贾海涛,闫薇儿</td>\n",
       "      <td>5.601562</td>\n",
       "      <td>喜剧</td>\n",
       "      <td>6217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1214</td>\n",
       "      <td>风流家族</td>\n",
       "      <td>男女关系,家庭关系,命运,院线电影</td>\n",
       "      <td>香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...</td>\n",
       "      <td>2002-03-07</td>\n",
       "      <td>邱礼涛,杨漪珊</td>\n",
       "      <td>张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...</td>\n",
       "      <td>6.800781</td>\n",
       "      <td>都市,喜剧,爱情,家庭</td>\n",
       "      <td>5963</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30639</td>\n",
       "      <td>大提琴的故事</td>\n",
       "      <td>短片,动画片</td>\n",
       "      <td>低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...</td>\n",
       "      <td>1949-01-01</td>\n",
       "      <td>伊里·特恩卡,契诃夫</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>动画,爱情</td>\n",
       "      <td>17371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>38522</td>\n",
       "      <td>歌舞大王齐格飞</td>\n",
       "      <td>喜剧片,人物传记,浪漫爱情</td>\n",
       "      <td>罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...</td>\n",
       "      <td>1936-04-08</td>\n",
       "      <td>罗伯特·Z·伦纳德,William Anthony McGuire</td>\n",
       "      <td>威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...</td>\n",
       "      <td>7.699219</td>\n",
       "      <td>剧情,歌舞,喜剧</td>\n",
       "      <td>10608</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   video_id video_name                   video_tags  \\\n",
       "0      3460       脱皮爸爸                 院线电影,家庭关系,命运   \n",
       "1     14553    喜气洋洋小金莲  古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷   \n",
       "2      1214       风流家族            男女关系,家庭关系,命运,院线电影   \n",
       "3     30639     大提琴的故事                       短片,动画片   \n",
       "4     38522    歌舞大王齐格飞                喜剧片,人物传记,浪漫爱情   \n",
       "\n",
       "                                   video_description video_release_date  \\\n",
       "0  中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...         2017-04-27   \n",
       "1  故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...         2015-12-30   \n",
       "2  香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...         2002-03-07   \n",
       "3  低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...         1949-01-01   \n",
       "4  罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...         1936-04-08   \n",
       "\n",
       "                 video_director_list  \\\n",
       "0                               司徒慧焯   \n",
       "1                            杨珊珊,李亚玲   \n",
       "2                            邱礼涛,杨漪珊   \n",
       "3                         伊里·特恩卡,契诃夫   \n",
       "4  罗伯特·Z·伦纳德,William Anthony McGuire   \n",
       "\n",
       "                                    video_actor_list  video_score  \\\n",
       "0                                      吴镇宇,古天乐,春夏,蔡洁     7.398438   \n",
       "1                                 陈南飞,程隆妮,王闯,贾海涛,闫薇儿     5.601562   \n",
       "2  张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...     6.800781   \n",
       "3                                                NaN          NaN   \n",
       "4  威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...     7.699219   \n",
       "\n",
       "  video_second_class  video_duration  \n",
       "0           剧情,喜剧,奇幻            5913  \n",
       "1                 喜剧            6217  \n",
       "2        都市,喜剧,爱情,家庭            5963  \n",
       "3              动画,爱情           17371  \n",
       "4           剧情,歌舞,喜剧           10608  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "video_features.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:54:49.429844Z",
     "iopub.status.busy": "2021-08-15T18:54:49.429319Z",
     "iopub.status.idle": "2021-08-15T18:54:49.443109Z",
     "shell.execute_reply": "2021-08-15T18:54:49.442487Z",
     "shell.execute_reply.started": "2021-08-15T18:54:49.429800Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>video_id</th>\n",
       "      <th>is_watch</th>\n",
       "      <th>is_share</th>\n",
       "      <th>is_collect</th>\n",
       "      <th>is_comment</th>\n",
       "      <th>watch_start_time</th>\n",
       "      <th>watch_label</th>\n",
       "      <th>pt_d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3275583</th>\n",
       "      <td>2</td>\n",
       "      <td>22485</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2021-04-19</td>\n",
       "      <td>3</td>\n",
       "      <td>20210419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3275584</th>\n",
       "      <td>2</td>\n",
       "      <td>25469</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2021-04-18</td>\n",
       "      <td>3</td>\n",
       "      <td>20210419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5391986</th>\n",
       "      <td>2</td>\n",
       "      <td>28411</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>20210419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5391987</th>\n",
       "      <td>2</td>\n",
       "      <td>49484</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>20210419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5391988</th>\n",
       "      <td>2</td>\n",
       "      <td>7069</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>20210419</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         user_id  video_id  is_watch  is_share  is_collect  is_comment  \\\n",
       "3275583        2     22485         1         0           0           0   \n",
       "3275584        2     25469         1         0           0           0   \n",
       "5391986        2     28411         0         0           0           0   \n",
       "5391987        2     49484         0         0           0           0   \n",
       "5391988        2      7069         0         0           0           0   \n",
       "\n",
       "        watch_start_time  watch_label      pt_d  \n",
       "3275583       2021-04-19            3  20210419  \n",
       "3275584       2021-04-18            3  20210419  \n",
       "5391986              NaN            0  20210419  \n",
       "5391987              NaN            0  20210419  \n",
       "5391988              NaN            0  20210419  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "history_behavior.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据分析\n",
    "\n",
    "## 用户特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:07.496865Z",
     "iopub.status.busy": "2021-07-20T13:23:07.496331Z",
     "iopub.status.idle": "2021-07-20T13:23:07.505896Z",
     "shell.execute_reply": "2021-07-20T13:23:07.504753Z",
     "shell.execute_reply.started": "2021-07-20T13:23:07.496819Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id        int32\n",
       "age             int8\n",
       "gender          int8\n",
       "country         int8\n",
       "province        int8\n",
       "city           int16\n",
       "city_level      int8\n",
       "device_name    int16\n",
       "dtype: object"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_features.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:07.876951Z",
     "iopub.status.busy": "2021-07-20T13:23:07.876453Z",
     "iopub.status.idle": "2021-07-20T13:23:07.885279Z",
     "shell.execute_reply": "2021-07-20T13:23:07.884739Z",
     "shell.execute_reply.started": "2021-07-20T13:23:07.876908Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>country</th>\n",
       "      <th>province</th>\n",
       "      <th>city</th>\n",
       "      <th>city_level</th>\n",
       "      <th>device_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1757005</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>17938</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>22</td>\n",
       "      <td>3</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4263520</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1411600</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>138</td>\n",
       "      <td>1</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3992242</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>142</td>\n",
       "      <td>0</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id  age  gender  country  province  city  city_level  device_name\n",
       "0  1757005    3       1        0         9     6           3          327\n",
       "1    17938    0       0        0         4    22           3          327\n",
       "2  4263520    1       0        0        19     1           5          327\n",
       "3  1411600    3       0        0         5   138           1          327\n",
       "4  3992242    2       0        0         0   142           0          327"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_features.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:08.257439Z",
     "iopub.status.busy": "2021-07-20T13:23:08.256930Z",
     "iopub.status.idle": "2021-07-20T13:23:08.793496Z",
     "shell.execute_reply": "2021-07-20T13:23:08.792973Z",
     "shell.execute_reply.started": "2021-07-20T13:23:08.257394Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(5910800, 8) 5910800\n",
      "(5910800, 8) 8\n",
      "(5910800, 8) 4\n",
      "(5910800, 8) 3\n",
      "(5910800, 8) 33\n",
      "(5910800, 8) 339\n",
      "(5910800, 8) 8\n",
      "(5910800, 8) 1826\n"
     ]
    }
   ],
   "source": [
    "for col in user_features.columns: \n",
    "    print(user_features.shape, user_features[col].nunique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:08.794635Z",
     "iopub.status.busy": "2021-07-20T13:23:08.794458Z",
     "iopub.status.idle": "2021-07-20T13:23:09.077807Z",
     "shell.execute_reply": "2021-07-20T13:23:09.077357Z",
     "shell.execute_reply.started": "2021-07-20T13:23:08.794619Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEACAYAAACj0I2EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQfElEQVR4nO3df6xfdX3H8eeLAk5F2bJeDaOtJbOojfjzCsswo84fK7DQbHOGzplpgP5jnYvO0E0DG2YJbMvcTFDXTGS6SAduc3VUy6JsbCquF1GgELACSpmzFVCjOKHzvT++p/rlctvvt+1pv/d+eD6Sm3vO53x6zqu397567vl+z/ebqkKStPAdNekAkqR+WOiS1AgLXZIaYaFLUiMsdElqhIUuSY2YaKEnuSLJriS3jTn/dUluT7I9yUcPdz5JWkgyyeehJ/kl4HvAh6vq+SPmrgCuBn65qh5K8oyq2nUkckrSQjDRM/SqugF4cHgsyc8n+VSSm5L8R5LndpsuAC6vqoe6P2uZS9KQ+XgNfSPwlqp6KfD7wPu68ZOBk5N8NsmNSVZPLKEkzUNHTzrAsCTHAb8IXJNk7/CTus9HAyuAVcAS4IYkp1TVt49wTEmal+ZVoTP4jeHbVfWiObbtBL5QVY8C9yS5i0HBbzuC+SRp3ppXl1yq6rsMyvo3ATLwwm7zxxmcnZNkMYNLMHdPIKYkzUuTftriVcDngeck2ZnkPOD1wHlJvgxsB9Z007cCDyS5HbgeeEdVPTCJ3JI0H030aYuSpP7Mq0sukqSDZ6FLUiMm9iyXxYsX1/Llyyd1eElakG666aZvVdXUXNsmVujLly9nZmZmUoeXpAUpydf2tc1LLpLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGzLfXQ5/T8g3X9r7Pey89u/d9StIkeYYuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEaMLPQkVyTZleS2EfNelmRPktf2F0+SNK5xztCvBFbvb0KSRcBlwHU9ZJIkHYSRhV5VNwAPjpj2FuAfgF19hJIkHbhDvoae5ETg14D3jzF3XZKZJDO7d+8+1ENLkob08aDoXwIXVtWPRk2sqo1VNV1V01NTUz0cWpK0Vx/vWDQNbEoCsBg4K8meqvp4D/uWJI3pkAu9qk7au5zkSuBfLHNJOvJGFnqSq4BVwOIkO4GLgWMAquoDhzWdJGlsIwu9qtaOu7OqeuMhpZEkHTTvFJWkRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1IiRhZ7kiiS7kty2j+2vT3JLkluTfC7JC/uPKUkaZZwz9CuB1fvZfg9wRlWdArwb2NhDLknSARrnTaJvSLJ8P9s/N7R6I7Ckh1ySpAPU9zX084BP9rxPSdIYRp6hjyvJKxgU+sv3M2cdsA5g2bJlfR1akkRPZ+hJXgD8DbCmqh7Y17yq2lhV01U1PTU11cehJUmdQy70JMuAfwTeUFV3HXokSdLBGHnJJclVwCpgcZKdwMXAMQBV9QHgIuBngfclAdhTVdOHK7AkaW7jPMtl7Yjt5wPn95ZoAVu+4dpe93fvpWf3uj9JbfNOUUlqhIUuSY2w0CWpERa6JDWitxuLtDD0/cAt+OCtNF94hi5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktSIkYWe5Ioku5Lcto/tSfLeJDuS3JLkJf3HlCSNMs4Z+pXA6v1sPxNY0X2sA95/6LEkSQdqZKFX1Q3Ag/uZsgb4cA3cCPx0khP6CihJGk8f19BPBO4bWt/ZjUmSjqAj+qBoknVJZpLM7N69+0geWpKa10eh3w8sHVpf0o09TlVtrKrpqpqemprq4dCSpL36eE/RzcD6JJuA04DvVNU3etivnsB871PpwI0s9CRXAauAxUl2AhcDxwBU1QeALcBZwA7gYeBNhyusJGnfRhZ6Va0dsb2AN/eWSJJ0ULxTVJIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhrRx3uKSk9Yvvep5pOxztCTrE5yZ5IdSTbMsX1ZkuuT3JzkliRn9R9VkrQ/Iws9ySLgcuBMYCWwNsnKWdPeBVxdVS8GzgXe13dQSdL+jXOGfiqwo6rurqpHgE3AmllzCnh6t3w88N/9RZQkjWOca+gnAvcNre8ETps154+A65K8BXgq8Kpe0kmSxtbXs1zWAldW1RLgLOAjSR637yTrkswkmdm9e3dPh5YkwXiFfj+wdGh9STc27DzgaoCq+jzwU8Di2Tuqqo1VNV1V01NTUweXWJI0p3EKfRuwIslJSY5l8KDn5llzvg68EiDJ8xgUuqfgknQEjSz0qtoDrAe2AncweDbL9iSXJDmnm/Z24IIkXwauAt5YVXW4QkuSHm+sG4uqaguwZdbYRUPLtwOn9xtNknQgvPVfkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJasRYhZ5kdZI7k+xIsmEfc16X5PYk25N8tN+YkqRRRr5JdJJFwOXAq4GdwLYkm7s3ht47ZwXwB8DpVfVQkmccrsCSpLmNc4Z+KrCjqu6uqkeATcCaWXMuAC6vqocAqmpXvzElSaOMU+gnAvcNre/sxoadDJyc5LNJbkyyuq+AkqTxjLzkcgD7WQGsApYANyQ5paq+PTwpyTpgHcCyZct6OrQkCcY7Q78fWDq0vqQbG7YT2FxVj1bVPcBdDAr+MapqY1VNV9X01NTUwWaWJM1hnELfBqxIclKSY4Fzgc2z5nycwdk5SRYzuARzd38xJUmjjCz0qtoDrAe2AncAV1fV9iSXJDmnm7YVeCDJ7cD1wDuq6oHDFVqS9HhjXUOvqi3AllljFw0tF/C27kOSNAHeKSpJjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqRFjFXqS1UnuTLIjyYb9zPuNJJVkur+IkqRxjCz0JIuAy4EzgZXA2iQr55j3NOCtwBf6DilJGm2cM/RTgR1VdXdVPQJsAtbMMe/dwGXA//aYT5I0pnEK/UTgvqH1nd3YjyV5CbC0qq7tMZsk6QAc8oOiSY4C/gJ4+xhz1yWZSTKze/fuQz20JGnIOIV+P7B0aH1JN7bX04DnA/+W5F7gF4DNcz0wWlUbq2q6qqanpqYOPrUk6XHGKfRtwIokJyU5FjgX2Lx3Y1V9p6oWV9XyqloO3AicU1UzhyWxJGlOIwu9qvYA64GtwB3A1VW1PcklSc453AElSeM5epxJVbUF2DJr7KJ9zF116LEkSQfKO0UlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDVirEJPsjrJnUl2JNkwx/a3Jbk9yS1JPp3kWf1HlSTtz8hCT7IIuBw4E1gJrE2ycta0m4HpqnoB8DHgT/sOKknav3HO0E8FdlTV3VX1CLAJWDM8oaqur6qHu9UbgSX9xpQkjTJOoZ8I3De0vrMb25fzgE8eSihJ0oE7us+dJfltYBo4Yx/b1wHrAJYtW9bnoSXpCW+cM/T7gaVD60u6scdI8irgncA5VfXDuXZUVRurarqqpqempg4mryRpH8Yp9G3AiiQnJTkWOBfYPDwhyYuBv2ZQ5rv6jylJGmXkJZeq2pNkPbAVWARcUVXbk1wCzFTVZuDPgOOAa5IAfL2qzjmMuSUdgOUbru19n/deenbv+9ShGesaelVtAbbMGrtoaPlVPeeSJB0g7xSVpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiN6fYMLSToUvirkofEMXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDVirEJPsjrJnUl2JNkwx/YnJfn7bvsXkizvPakkab9GFnqSRcDlwJnASmBtkpWzpp0HPFRVzwbeA1zWd1BJ0v6Nc4Z+KrCjqu6uqkeATcCaWXPWAH/bLX8MeGWS9BdTkjRKqmr/E5LXAqur6vxu/Q3AaVW1fmjObd2cnd36V7s535q1r3XAum71OcCdff1FOouBb42cNXnm7Jc5+7MQMsITO+ezqmpqrg1H9Nb/qtoIbDxc+08yU1XTh2v/fTFnv8zZn4WQEcy5L+NccrkfWDq0vqQbm3NOkqOB44EH+ggoSRrPOIW+DViR5KQkxwLnAptnzdkM/E63/FrgMzXqWo4kqVcjL7lU1Z4k64GtwCLgiqranuQSYKaqNgMfBD6SZAfwIIPSn4TDdjmnZ+bslzn7sxAygjnnNPJBUUnSwuCdopLUCAtdkhphoUtSIxZ0oSd5bpILk7y3+7gwyfMmnWuh6r6er0xy3Kzx1ZPKNFuSU5O8rFtemeRtSc6adK5Rknx40hlGSfLy7uv5mklnGZbktCRP75afnOSPk3wiyWVJjp90vr2S/G6SpaNnHsYMC/VB0SQXAmsZvBTBzm54CYNn2GyqqksnlW1cSd5UVR+adA4YfDMCbwbuAF4EvLWq/rnb9sWqeskE49HluJjBawodDfwrcBpwPfBqYGtV/ckE4/1YktlP6w3wCuAzAFV1zhEPNYck/1VVp3bLFzD49/8n4DXAJ+bLz1CS7cALu2fcbQQepnuJkW781ycasJPkO8D3ga8CVwHXVNXuIxqiqhbkB3AXcMwc48cCX5l0vjH/Dl+fdIahLLcCx3XLy4EZBqUOcPOk8w1lXAQ8Bfgu8PRu/MnALZPON5Tzi8DfAauAM7rP3+iWz5h0vqGcNw8tbwOmuuWnArdOOt9QtjuGv7aztn1p0vmGv54Mrnq8hsFTuXcDn2Jwj87TjkSGI3rrf89+BPwc8LVZ4yd02+aFJLfsaxPwzCOZZYSjqup7AFV1b5JVwMeSPItB1vlgT1X9H/Bwkq9W1XcBquoHSebNvzkwDbwVeCfwjqr6UpIfVNW/TzjXbEcl+RkGJZTqziar6vtJ9kw22mPcNvTb7JeTTFfVTJKTgUcnHW5IVdWPgOuA65Icw+A3yrXAnwNzvv5KnxZyof8e8OkkXwHu68aWAc8G1u/rD03AM4FfAR6aNR7gc0c+zj59M8mLqupLAFX1vSS/ClwBnDLRZD/xSJKnVNXDwEv3DnbXUedNoXc/1O9Jck33+ZvMz5+144GbGHwvVpITquob3WMo8+U/cYDzgb9K8i4GL3T1+ST3Mfi5P3+iyR7rMV+zqnqUwV30m5M85YgE6H5VWJCSHMXg5X1P7IbuB7Z1Z3HzQpIPAh+qqv+cY9tHq+q3JhDrcZIsYXAG/D9zbDu9qj47gVizczypqn44x/hi4ISqunUCsUZKcjZwelX94aSzjKMrn2dW1T2TzjKse2D0JAb/Oe6sqm9OONJjJDm5qu6aaIaFXOiSpJ9Y0E9blCT9hIUuSY2w0CWpERa6JDXCQpekRvw/PoUSjedMtq4AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['age'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:09.217654Z",
     "iopub.status.busy": "2021-07-20T13:23:09.217140Z",
     "iopub.status.idle": "2021-07-20T13:23:09.317105Z",
     "shell.execute_reply": "2021-07-20T13:23:09.316592Z",
     "shell.execute_reply.started": "2021-07-20T13:23:09.217608Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEACAYAAACatzzfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAKfElEQVR4nO3dX4xmd13H8c+3f6goRi92JLULjNECUdA2bGq0iSFNiBtL5EqD/y5M416BkPivxgvjjdYbo0Y02Uj9Twmi0YYqhGBJLULpLBTotoCIRUvEDtCmbjBA4evFzNKlTjvPtvPM852d1yuZZJ9zzp795mT3vb89e55nqrsDwFwXrXoAAJ6aUAMMJ9QAwwk1wHBCDTCcUAMMt7RQV9XNVfVQVd274PE/XlX3VdXpqnrTsuYCOGhqWc9RV9UPJTmT5M+7+yW7HHtlkrckua67H66qb+vuh5YyGMABs7QVdXffkeTz526rqu+sqrdX1amq+ueqevH2rp9L8obufnj754o0wLb9vkd9Mslru/tlSX4xyR9ub39hkhdW1Xuq6n1VdXyf5wIY65L9+oWq6jlJfjDJX1fV2c2XnTPHlUlenuRokjuq6qXd/ch+zQcw1b6FOlur90e6+6od9j2Y5K7u/nKSf6+qj2cr3Hfv43wAI+3brY/ufjRbEf6xJKkt37e9+++ytZpOVR3J1q2QT+7XbACTLfPxvFuSvDfJi6rqwaq6IclPJbmhqj6U5HSSV20f/o4kn6uq+5LcnuSXuvtzy5oN4CBZ2uN5AOwN70wEGE6oAYZbylMfR44c6fX19WWcGuCCdOrUqc9299pO+5YS6vX19WxsbCzj1AAXpKr61JPtc+sDYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4fbz86ifkfUbb1v1CLt64KbrVz0CcAGyogYYTqgBhhNqgOGEGmA4oQYYTqgBhhNqgOGEGmA4oQYYTqgBhls41FV1cVV9sKretsyBAPh657Oifl2S+5c1CAA7WyjUVXU0yfVJ/ni54wDwRIuuqH83yS8n+eqTHVBVJ6pqo6o2Njc392I2ALJAqKvqlUke6u5TT3Vcd5/s7mPdfWxtbW3PBgQ47BZZUV+b5Eer6oEkb05yXVX95VKnAuBrdg11d/9qdx/t7vUkr07yT93900ufDIAknqMGGO+8vhVXd787ybuXMgkAO7KiBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4XYNdVV9Q1W9v6o+VFWnq+o39mMwALZcssAxX0xyXXefqapLk9xZVf/Y3e9b8mwAZIFQd3cnObP98tLtr17mUAA8bqF71FV1cVXdk+ShJO/s7ruWOhUAX7NQqLv7K919VZKjSa6pqpc88ZiqOlFVG1W1sbm5ucdjAhxe5/XUR3c/kuT2JMd32Heyu49197G1tbU9Gg+ARZ76WKuqb93+8bOTvCLJR5c8FwDbFnnq4/Ikf1ZVF2cr7G/p7rctdywAzlrkqY8PJ7l6H2YBYAfemQgwnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMPtGuqqel5V3V5V91XV6ap63X4MBsCWSxY45rEkv9DdH6iqb05yqqre2d33LXk2ALLAirq7/6u7P7D94/9Jcn+SK5Y9GABbzusedVWtJ7k6yV1LmQaA/2fhUFfVc5L8TZLXd/ejO+w/UVUbVbWxubm5lzMCHGoLhbqqLs1WpP+qu/92p2O6+2R3H+vuY2tra3s5I8ChtshTH5XkjUnu7+7fWf5IAJxrkRX1tUl+Jsl1VXXP9tePLHkuALbt+nhed9+ZpPZhFgB24J2JAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDCfUAMMJNcBwQg0wnFADDLfrNw7gwrN+422rHmEhD9x0/apHgBGsqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4XYNdVXdXFUPVdW9+zEQAF9vkRX1nyY5vuQ5AHgSu4a6u+9I8vl9mAWAHbhHDTDcnoW6qk5U1UZVbWxubu7VaQEOvT0LdXef7O5j3X1sbW1tr04LcOi59QEw3CKP592S5L1JXlRVD1bVDcsfC4CzLtntgO7+if0YBICdufUBMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcAuFuqqOV9XHquoTVXXjsocC4HGX7HZAVV2c5A1JXpHkwSR3V9Wt3X3fsoeDg2D9xttWPcJCHrjp+lWPwNO0yIr6miSf6O5PdveXkrw5yauWOxYAZ+26ok5yRZL/POf1g0m+/4kHVdWJJCe2X56pqo898/GW6kiSz+7lCeu39/JsB47rubdcz72159dzCV7wZDsWCfVCuvtkkpN7db5lq6qN7j626jkuFK7n3nI999ZBv56L3Pr4dJLnnfP66PY2APbBIqG+O8mVVfUdVfWsJK9OcutyxwLgrF1vfXT3Y1X1miTvSHJxkpu7+/TSJ1u+A3Ob5oBwPfeW67m3DvT1rO5e9QwAPAXvTAQYTqgBhhNqgOH27Dnq6arqxdl6R+UV25s+neTW7r5/dVPBlu3fn1ckuau7z5yz/Xh3v311kx08VXVNku7uu6vqu5McT/LR7v6HFY/2tB2KFXVV/Uq23vpeSd6//VVJbvEhU3urqn521TMcNFX180n+Pslrk9xbVed+RMNvrmaqg6mqfj3J7yf5o6r6rSR/kOSbktxYVb+20uGegUPx1EdVfTzJ93T3l5+w/VlJTnf3lauZ7MJTVf/R3c9f9RwHSVV9JMkPdPeZqlpP8tYkf9Hdv1dVH+zuq1c74cGxfS2vSnJZks8kOdrdj1bVs7P1r5XvXeV8T9dhufXx1STfnuRTT9h++fY+zkNVffjJdiV57n7OcoG46Oztju5+oKpenuStVfWCbF1TFvdYd38lyReq6t+6+9Ek6e7/raoD+2f9sIT69UneVVX/msc/YOr5Sb4ryWtWNdQB9twkP5zk4SdsryT/sv/jHHj/XVVXdfc9SbK9sn5lkpuTvHSlkx08X6qqb+zuLyR52dmNVfUtOcCLskNx6yNJquqibH1k67n/mXj39t++nIeqemOSP+nuO3fY96bu/skVjHVgVdXRbK0EP7PDvmu7+z0rGOtAqqrLuvuLO2w/kuTy7v7ICsZ6xg5NqAEOqkPx1AfAQSbUAMMJNcBwQg0wnFADDPd/cdUoWbwujykAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['gender'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:09.656308Z",
     "iopub.status.busy": "2021-07-20T13:23:09.655782Z",
     "iopub.status.idle": "2021-07-20T13:23:09.754754Z",
     "shell.execute_reply": "2021-07-20T13:23:09.754307Z",
     "shell.execute_reply.started": "2021-07-20T13:23:09.656261Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEACAYAAACatzzfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAK8klEQVR4nO3cW4xcBR3H8d+PLngBIg8dCFLKEi0QhAC6wSCGIEatQuRFDUQlEuI+gZB4w0dfjL4YeUDjBuuVSxBFCcYiUQiiUNhysxdAUkFK1C63QH0QCj8fZpYum2nnFObM/Nv9fpJNd2dOd3/JpN+cnJ5ZJxEAoK79xj0AALB7hBoAiiPUAFAcoQaA4gg1ABRHqAGguNZCbXuN7W22NzQ8/jO2N9neaPuatnYBwN7Gbd1HbfsMSdsl/SzJCQOOXSXpeklnJXnO9qFJtrUyDAD2Mq2dUSe5Q9KzCx+z/S7ba22vt/1n28f1nvqipCuTPNf7u0QaAHpGfY16RtIlSd4n6SuSvt97/BhJx9j+i+27ba8e8S4AKGtiVD/I9kGSPiDpl7bnH37Lgh2rJJ0paYWkO2yfmOT5Ue0DgKpGFmp1z96fT3Jyn+e2SlqX5GVJ/7D9qLrhvneE+wCgpJFd+kjygroR/rQkueuk3tO/UfdsWraXq3spZMuotgFAZW3ennetpLskHWt7q+2LJH1W0kW2H5S0UdK5vcNvkfSM7U2SbpP01STPtLUNAPYmrd2eBwAYDt6ZCADFEWoAKK6Vuz6WL1+eycnJNr41AOyT1q9f/3SSTr/nWgn15OSkZmdn2/jWALBPsv3Erp7j0gcAFNco1LYPsX2D7Ydtb7Z9WtvDAABdTS99XCFpbZJP2T5A0ttb3AQAWGBgqG2/Q9IZkr4gSUlekvRSu7MAAPOaXPo4WtKcpB/bvt/2VbYPbHkXAKCnSagnJL1X0g+SnCLpv5IuX3yQ7Wnbs7Zn5+bmhjwTAJauJqHeKmlrknW9r29QN9yvk2QmyVSSqU6n762AAIA3YGCok/xb0pO2j+099GFJm1pdBQB4TdO7Pi6RdHXvjo8tki5sb9Kembz8d+Oe0KrHv332uCcAGLNGoU7ygKSpdqcAAPrhnYkAUByhBoDiCDUAFEeoAaA4Qg0AxRFqACiOUANAcYQaAIoj1ABQHKEGgOIINQAUR6gBoDhCDQDFEWoAKI5QA0BxhBoAiiPUAFAcoQaA4gg1ABRHqAGgOEINAMURagAojlADQHGEGgCKI9QAUByhBoDiJpocZPtxSS9KekXSjiRTbY4CAOzUKNQ9H0rydGtLAAB9cekDAIprGupI+oPt9ban2xwEAHi9ppc+PpjkKduHSrrV9sNJ7lh4QC/g05K0cuXKIc8EgKWr0Rl1kqd6f26TdKOkU/scM5NkKslUp9MZ7koAWMIGhtr2gbYPnv9c0kclbWh7GACgq8mlj8Mk3Wh7/vhrkqxtdRUA4DUDQ51ki6STRrAFANAHt+cBQHGEGgCKI9QAUByhBoDiCDUAFEeoAaA4Qg0AxRFqACiOUANAcYQaAIoj1ABQHKEGgOIINQAUR6gBoDhCDQDFEWoAKI5QA0BxhBoAiiPUAFAcoQaA4gg1ABRHqAGgOEINAMURagAojlADQHGEGgCKaxxq28ts32/75jYHAQBeb0/OqC+VtLmtIQCA/hqF2vYKSWdLuqrdOQCAxZqeUX9P0tckvdreFABAPwNDbfscSduSrB9w3LTtWduzc3NzQxsIAEtdkzPq0yV90vbjkq6TdJbtXyw+KMlMkqkkU51OZ8gzAWDpGhjqJN9IsiLJpKTzJP0pyedaXwYAkMR91ABQ3sSeHJzkdkm3t7IEANAXZ9QAUByhBoDiCDUAFEeoAaA4Qg0AxRFqACiOUANAcYQaAIoj1ABQHKEGgOIINQAUR6gBoDhCDQDFEWoAKI5QA0BxhBoAiiPUAFAcoQaA4gg1ABRHqAGgOEINAMURagAojlADQHGEGgCKI9QAUByhBoDiBoba9ltt32P7QdsbbX9zFMMAAF0TDY75n6Szkmy3vb+kO23/PsndLW8DAKhBqJNE0vbel/v3PtLmKADATo2uUdteZvsBSdsk3ZpkXaurAACvaRTqJK8kOVnSCkmn2j5h8TG2p23P2p6dm5sb8kwAWLr26K6PJM9Luk3S6j7PzSSZSjLV6XSGNA8A0OSuj47tQ3qfv03SRyQ93PIuAEBPk7s+Dpf0U9vL1A379UlubncWAGBek7s+HpJ0ygi2AAD64J2JAFAcoQaA4gg1ABRHqAGgOEINAMURagAojlADQHGEGgCKI9QAUByhBoDiCDUAFEeoAaA4Qg0AxRFqACiOUANAcYQaAIoj1ABQHKEGgOIINQAUR6gBoDhCDQDFEWoAKI5QA0BxhBoAiiPUAFAcoQaA4gg1ABQ3MNS2j7R9m+1NtjfavnQUwwAAXRMNjtkh6ctJ7rN9sKT1tm9NsqnlbQAANTijTvKvJPf1Pn9R0mZJR7Q9DADQtUfXqG1PSjpF0ro+z03bnrU9Ozc3N6R5AIDGobZ9kKRfSbosyQuLn08yk2QqyVSn0xnmRgBY0hqF2vb+6kb66iS/bncSAGChJnd9WNKPJG1O8t32JwEAFmpyRn26pM9LOsv2A72PT7S8CwDQM/D2vCR3SvIItgAA+uCdiQBQHKEGgOIINQAUR6gBoDhCDQDFEWoAKI5QA0BxhBoAiiPUAFAcoQaA4gg1ABRHqAGgOEINAMURagAojlADQHGEGgCKI9QAUByhBoDiCDUAFEeoAaA4Qg0AxRFqACiOUANAcYQaAIoj1ABQHKEGgOIGhtr2GtvbbG8YxSAAwOs1OaP+iaTVLe8AAOzCwFAnuUPSsyPYAgDoY2jXqG1P2561PTs3NzesbwsAS97QQp1kJslUkqlOpzOsbwsASx53fQBAcYQaAIprcnvetZLuknSs7a22L2p/FgBg3sSgA5KcP4ohAID+uPQBAMURagAojlADQHGEGgCKI9QAUByhBoDiCDUAFEeoAaA4Qg0AxRFqACiOUANAcYQaAIoj1ABQHKEGgOIINQAUR6gBoDhCDQDFEWoAKI5QA0BxhBoAiiPUAFAcoQaA4gg1ABRHqAGgOEINAMURagAorlGoba+2/Yjtx2xf3vYoAMBOA0Nte5mkKyV9XNLxks63fXzbwwAAXU3OqE+V9FiSLUleknSdpHPbnQUAmDfR4JgjJD254Outkt6/+CDb05Kme19ut/3Im59X0nJJT4/qh/k7o/pJS8ZIXz8M3b78+h21qyeahLqRJDOSZob1/aqyPZtkatw78Mbw+u3dlurr1+TSx1OSjlzw9YreYwCAEWgS6nslrbJ9tO0DJJ0n6aZ2ZwEA5g289JFkh+2LJd0iaZmkNUk2tr6srn3+8s4+jtdv77YkXz8nGfcGAMBu8M5EACiOUANAcYQaAIob2n3U+yrbx6n7Tswjeg89JemmJJvHtwrY9/X+7R0haV2S7QseX51k7fiWjR5n1Lth++vqvmXeku7pfVjStfxyqr2b7QvHvQG7ZvtLkn4r6RJJG2wv/LUV3xrPqvHhro/dsP2opPckeXnR4wdI2phk1XiW4c2y/c8kK8e9A/3Z/puk05Jstz0p6QZJP09yhe37k5wy3oWjxaWP3XtV0jslPbHo8cN7z6Ew2w/t6ilJh41yC/bYfvOXO5I8bvtMSTfYPkrd129JIdS7d5mkP9r+u3b+YqqVkt4t6eJxjUJjh0n6mKTnFj1uSX8d/Rzsgf/YPjnJA5LUO7M+R9IaSSeOddkYEOrdSLLW9jHq/qrXhf+ZeG+SV8a3DA3dLOmg+X/sC9m+feRrsCcukLRj4QNJdki6wPYPxzNpfLhGDQDFcdcHABRHqAGgOEINAMURagAojlADQHH/B34ESI0yPoyZAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['country'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:09.896911Z",
     "iopub.status.busy": "2021-07-20T13:23:09.896666Z",
     "iopub.status.idle": "2021-07-20T13:23:10.106002Z",
     "shell.execute_reply": "2021-07-20T13:23:10.105517Z",
     "shell.execute_reply.started": "2021-07-20T13:23:09.896890Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD7CAYAAACfQGjDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZtklEQVR4nO3df7RdZXng8e9DAAsiJIRrpEkkTI0yaAfEFNLqjI60cJEuw+qgo50lkUXNzBLRqrNKHJ2VFvwRZ83SkalmFpVookWkWEumJsQUcJzONJALYgIE5BpBkuFHmiDUoRXBZ/7Yb6abw3nPPUluzr3J/X7W2uu++9nP2e8+5+6zn7Pfvc+9kZlIktTNYRO9AZKkycsiIUmqskhIkqosEpKkKouEJKnKIiFJqjp8ojdgvJ1wwgk5b968id4MSTqo3HHHHX+bmUOd8UOuSMybN4+RkZGJ3gxJOqhExEPd4g43SZKqLBKSpCqLhCSpyiIhSaqySEiSqiwSkqQqi4QkqcoiIUmqOuS+TLfHvKXfekHsweXnT8CWSNLByzMJSVKVRUKSVGWRkCRVWSQkSVUWCUlSVV9FIiKmR8QNEXFfRGyNiF+PiOMjYkNEPFB+zii5ERFXRcRoRGyOiDNa61lc8h+IiMWt+OsiYkt5zFURESXetQ9J0mD0eybxOeCmzDwFOA3YCiwFbs7M+cDNZR7gPGB+mZYAK6A54APLgLOAM4FlrYP+CuA9rccNl3itD0nSAIxZJCLiOOBfANcAZOYzmfkTYBGwqqStAi4o7UXA6mxsBKZHxInAucCGzNydmU8AG4DhsuzYzNyYmQms7lhXtz4kSQPQz5nEycBO4EsR8b2I+GJEvBiYlZmPlJxHgVmlPRt4uPX47SXWK769S5wefUiSBqCfInE4cAawIjNfC/xfOoZ9yhlAjv/m9ddHRCyJiJGIGNm5c+eB3AxJmlL6KRLbge2ZeVuZv4GmaDxWhoooPx8vy3cAc1uPn1NiveJzusTp0cfzZObVmbkgMxcMDb3g/3hLkvbRmEUiMx8FHo6IV5XQ2cC9wBpgzx1Ki4EbS3sNcFG5y2kh8GQZMloPnBMRM8oF63OA9WXZUxGxsNzVdFHHurr1IUkagH7/wN9lwJ9GxJHANuBimgJzfURcAjwEvL3krgXeAowCT5dcMnN3RFwJbCp5V2Tm7tJ+L/Bl4ChgXZkAllf6kCQNQF9FIjPvAhZ0WXR2l9wELq2sZyWwskt8BHhNl/iubn1IkgbDb1xLkqosEpKkKouEJKnKIiFJqrJISJKqLBKSpCqLhCSpyiIhSaqySEiSqiwSkqQqi4QkqcoiIUmqskhIkqosEpKkKouEJKnKIiFJqrJISJKqLBKSpCqLhCSpyiIhSaqySEiSqiwSkqQqi4QkqcoiIUmq6qtIRMSDEbElIu6KiJESOz4iNkTEA+XnjBKPiLgqIkYjYnNEnNFaz+KS/0BELG7FX1fWP1oeG736kCQNxt6cSfzLzDw9MxeU+aXAzZk5H7i5zAOcB8wv0xJgBTQHfGAZcBZwJrCsddBfAbyn9bjhMfqQJA3A/gw3LQJWlfYq4IJWfHU2NgLTI+JE4FxgQ2buzswngA3AcFl2bGZuzMwEVnesq1sfkqQB6LdIJPDtiLgjIpaU2KzMfKS0HwVmlfZs4OHWY7eXWK/49i7xXn1Ikgbg8D7z3pCZOyLipcCGiLivvTAzMyJy/Devvz5K4VoC8PKXv/xAboYkTSl9nUlk5o7y83HgmzTXFB4rQ0WUn4+X9B3A3NbD55RYr/icLnF69NG5fVdn5oLMXDA0NNTPU5Ik9WHMIhERL46Il+xpA+cAdwNrgD13KC0GbiztNcBF5S6nhcCTZchoPXBORMwoF6zPAdaXZU9FxMJyV9NFHevq1ockaQD6GW6aBXyz3JV6OHBtZt4UEZuA6yPiEuAh4O0lfy3wFmAUeBq4GCAzd0fElcCmkndFZu4u7fcCXwaOAtaVCWB5pQ9J0gCMWSQycxtwWpf4LuDsLvEELq2sayWwskt8BHhNv31IkgbDb1xLkqosEpKkKouEJKnKIiFJqrJISJKqLBKSpCqLhCSpyiIhSaqySEiSqiwSkqQqi4QkqcoiIUmq6vefDh3S5i391gtiDy4/fwK2RJImF88kJElVFglJUpXDTXvBYSlJU41nEpKkKouEJKnKIiFJqrJISJKqLBKSpCqLhCSpyiIhSaqySEiSqvouEhExLSK+FxF/WeZPjojbImI0Ir4eEUeW+IvK/GhZPq+1jo+U+P0RcW4rPlxioxGxtBXv2ockaTD25kziA8DW1vyngc9m5iuAJ4BLSvwS4IkS/2zJIyJOBd4BvBoYBr5QCs804PPAecCpwDtLbq8+JEkD0FeRiIg5wPnAF8t8AG8Gbigpq4ALSntRmacsP7vkLwKuy8yfZeaPgFHgzDKNZua2zHwGuA5YNEYfkqQB6PdM4r8AfwD8oszPBH6Smc+W+e3A7NKeDTwMUJY/WfL/f7zjMbV4rz4kSQMwZpGIiN8GHs/MOwawPfskIpZExEhEjOzcuXOiN0eSDhn9nEm8HnhrRDxIMxT0ZuBzwPSI2PNXZOcAO0p7BzAXoCw/DtjVjnc8phbf1aOP58nMqzNzQWYuGBoa6uMpSZL6MWaRyMyPZOaczJxHc+H5lsz8N8CtwIUlbTFwY2mvKfOU5bdkZpb4O8rdTycD84HbgU3A/HIn05GljzXlMbU+JEkDsD/fk7gc+FBEjNJcP7imxK8BZpb4h4ClAJl5D3A9cC9wE3BpZj5Xrjm8D1hPc/fU9SW3Vx+SpAHYq386lJnfAb5T2tto7kzqzPkH4G2Vx38C+ESX+FpgbZd41z4kSYPhN64lSVUWCUlSlUVCklRlkZAkVVkkJElVFglJUpVFQpJUZZGQJFVZJCRJVRYJSVKVRUKSVGWRkCRVWSQkSVV79Vdg1Z95S7/1gtiDy8+fgC2RpP1jkZhAFhNJk53DTZKkKouEJKnKIiFJqrJISJKqLBKSpCqLhCSpyiIhSaqySEiSqvwy3UHAL91JmihjnklExC9FxO0R8f2IuCci/qjET46I2yJiNCK+HhFHlviLyvxoWT6vta6PlPj9EXFuKz5cYqMRsbQV79qHJGkw+hlu+hnw5sw8DTgdGI6IhcCngc9m5iuAJ4BLSv4lwBMl/tmSR0ScCrwDeDUwDHwhIqZFxDTg88B5wKnAO0suPfqQJA3AmEUiGz8ts0eUKYE3AzeU+CrggtJeVOYpy8+OiCjx6zLzZ5n5I2AUOLNMo5m5LTOfAa4DFpXH1PqQJA1AXxeuyyf+u4DHgQ3AD4GfZOazJWU7MLu0ZwMPA5TlTwIz2/GOx9TiM3v00bl9SyJiJCJGdu7c2c9TkiT1oa8ikZnPZebpwByaT/6nHMiN2luZeXVmLsjMBUNDQxO9OZJ0yNirW2Az8yfArcCvA9MjYs/dUXOAHaW9A5gLUJYfB+xqxzseU4vv6tGHJGkA+rm7aSgippf2UcBvAVtpisWFJW0xcGNprynzlOW3ZGaW+DvK3U8nA/OB24FNwPxyJ9ORNBe315TH1PqQJA1AP9+TOBFYVe5COgy4PjP/MiLuBa6LiI8D3wOuKfnXAF+JiFFgN81Bn8y8JyKuB+4FngUuzcznACLifcB6YBqwMjPvKeu6vNKHJGkAxiwSmbkZeG2X+Daa6xOd8X8A3lZZ1yeAT3SJrwXW9tuHJGkw/LMckqQqi4QkqcoiIUmqskhIkqosEpKkKouEJKnKIiFJqrJISJKqLBKSpCr/fekhxH9zKmm8eSYhSaryTGIK8oxDUr88k5AkVVkkJElVFglJUpVFQpJU5YVr9eRFbmlq80xCklTlmYTGhWcc0qHJMwlJUpVFQpJUZZGQJFV5TUID5bUL6eDimYQkqWrMIhERcyPi1oi4NyLuiYgPlPjxEbEhIh4oP2eUeETEVRExGhGbI+KM1roWl/wHImJxK/66iNhSHnNVRESvPiRJg9HPcNOzwIcz886IeAlwR0RsAN4N3JyZyyNiKbAUuBw4D5hfprOAFcBZEXE8sAxYAGRZz5rMfKLkvAe4DVgLDAPryjq79aFDnMNS0uQw5plEZj6SmXeW9t8BW4HZwCJgVUlbBVxQ2ouA1dnYCEyPiBOBc4ENmbm7FIYNwHBZdmxmbszMBFZ3rKtbH5KkAdiraxIRMQ94Lc0n/lmZ+UhZ9Cgwq7RnAw+3Hra9xHrFt3eJ06OPzu1aEhEjETGyc+fOvXlKkqQe+i4SEXEM8A3g9zPzqfaycgaQ47xtz9Orj8y8OjMXZOaCoaGhA7kZkjSl9FUkIuIImgLxp5n55yX8WBkqovx8vMR3AHNbD59TYr3ic7rEe/UhSRqAMS9clzuNrgG2ZuZnWovWAIuB5eXnja34+yLiOpoL109m5iMRsR74ZOsOpXOAj2Tm7oh4KiIW0gxjXQT81zH6kAAvcEsHWj93N70eeBewJSLuKrH/QHPgvj4iLgEeAt5elq0F3gKMAk8DFwOUYnAlsKnkXZGZu0v7vcCXgaNo7mpaV+K1PiRJAzBmkcjMvwaisvjsLvkJXFpZ10pgZZf4CPCaLvFd3fqQJA2G37iWJFVZJCRJVRYJSVKVRUKSVGWRkCRVWSQkSVX+0yFNCX7pTto3FgmpxWIiPZ9FQtpHFhRNBRYJ6QCzmOhg5oVrSVKVRUKSVGWRkCRVWSQkSVVeuJYmCS9wazLyTEKSVGWRkCRVOdwkHWQcltIgeSYhSaqySEiSqhxukg5RDktpPHgmIUmqskhIkqocbpKmOIel1MuYZxIRsTIiHo+Iu1ux4yNiQ0Q8UH7OKPGIiKsiYjQiNkfEGa3HLC75D0TE4lb8dRGxpTzmqoiIXn1Ikgann+GmLwPDHbGlwM2ZOR+4ucwDnAfML9MSYAU0B3xgGXAWcCawrHXQXwG8p/W44TH6kCQNyJjDTZn53YiY1xFeBLyptFcB3wEuL/HVmZnAxoiYHhEnltwNmbkbICI2AMMR8R3g2MzcWOKrgQuAdT36kDQBHJaamvb1wvWszHyktB8FZpX2bODhVt72EusV394l3quPF4iIJRExEhEjO3fu3IenI0nqZr/vbipnDTkO27LPfWTm1Zm5IDMXDA0NHchNkaQpZV+LxGNlGIny8/ES3wHMbeXNKbFe8Tld4r36kCQNyL4WiTXAnjuUFgM3tuIXlbucFgJPliGj9cA5ETGjXLA+B1hflj0VEQvLXU0XdayrWx+SpAEZ88J1RHyN5gLyCRGxneYupeXA9RFxCfAQ8PaSvhZ4CzAKPA1cDJCZuyPiSmBTybtiz0Vs4L00d1AdRXPBel2J1/qQNMn1e5Hbi+GTXz93N72zsujsLrkJXFpZz0pgZZf4CPCaLvFd3fqQJA2O37iWNOl5xjFx/NtNkqQqi4QkqcrhJkmHDIelxp9FQtKUYzHpn8NNkqQqi4QkqcoiIUmqskhIkqosEpKkKu9ukqQK74LyTEKS1INFQpJUZZGQJFVZJCRJVRYJSVKVdzdJ0n7qdhcUHBr/jc8zCUlSlUVCklTlcJMkTUKTZVjKMwlJUpVFQpJU5XCTJB3EDvSwlGcSkqSqSV8kImI4Iu6PiNGIWDrR2yNJU8mkHm6KiGnA54HfArYDmyJiTWbeO7FbJkkHl30dlprsZxJnAqOZuS0znwGuAxZN8DZJ0pQRmTnR21AVERcCw5n5e2X+XcBZmfm+jrwlwJIy+yrg/o5VnQD8bR9dTva8iex7sudNZN8+58mXN5F9T/a8Wu5JmTn0gszMnLQTcCHwxdb8u4A/3of1jBwKeQfDNvra+JwnQ97BsI0Hw2uTmZN+uGkHMLc1P6fEJEkDMNmLxCZgfkScHBFHAu8A1kzwNknSlDGp727KzGcj4n3AemAasDIz79mHVV19iORNZN+TPW8i+/Y5T768iex7suftVe6kvnAtSZpYk324SZI0gSwSkqQqi4QkqWpSX7jeVxFxCs03s2eX0A5gTWZu3Y/1zQZuy8yftuLDmXlTa/5MIDNzU0ScCgwD92Xm2jHWvzozLxoj5w0030C/OzO/3bHsLGBrZj4VEUcBS4EzgHuBT2bmkyXv/cA3M/PhMfracyfZ/8nMv4qI3wV+A9gKXJ2ZP2/l/hPgd2huVX4O+AFwbWY+1asPaV9ExEsz8/FxXN/MzNw1Xus7FB1yZxIRcTnNn+8I4PYyBfC1fv9AYERc3Gq/H7gRuAy4OyLafxbkk628ZcBVwIqI+BTwx8CLgaUR8dFW3pqO6b8Dv7NnvpV3e6v9nrK+lwDLujyPlcDTpf054Djg0yX2pVbelcBtEfE/I+K9EfHCb1c2vgScD3wgIr4CvA24Dfg14Isdr81/A36pLHsRTbHYGBFvqqz7oBERLx3n9c0cz/Xtr4g4LiKWR8R9EbE7InZFxNYSm97nOta12sdGxKci4ivlg0U77wut9ssiYkVEfD4iZkbEH0bEloi4PiJObOUd3zHNBG6PiBkRcXwrb7jjOV0TEZsj4tqImNVatjwiTijtBRGxjeb98FBEvLFje++MiI9FxK+M8fwXRMStEfHViJgbERsi4smI2BQRr23lHRMRV0TEPWX5zojYGBHv7ljf4RHxbyPipvIcNkfEuoj4dxFxRK9taa3j6lZ7WlnflRHx+o68j/Wzvr6+cXcwTTSfZI/oEj8SeKDPdfy41d4CHFPa84AR4ANl/nsdedOAo4GngGNL/ChgcyvvTuCrwJuAN5afj5T2G1t57XVvAoZK+8XAlo7t3dpef8eyu9rrpPlgcA5wDbATuAlYDLyklbe5/DwceAyYVuaj47lsaS07GvhOab+8Y/uPA5YD9wG7gV00ZyXLgel78btd12ofC3wK+Arwux15X2i1XwasoPlDkTOBPyzbfT1wYivv+I5pJvAgMAM4vpU33PG8rgE2A9cCs1rLlgMnlPYCYBswCjzU8Xu+E/gY8Ct9PP8FwK1l/5kLbACeLPvHa1t5xwBXAPeU5TuBjcC7O9a3HrgceFnH63U58O1W7IzK9DrgkVbeN8rzvoDm+0zfAF7UuV+Wfe4ymjPezaW/uSV2YyvvF8CPOqafl5/buu3zNB9iPg6cBHwQ+Iv2/tpq3wr8Wmm/ko5vIJc+/jPwY5oPmh8EfrnL7+R24DzgncDDwIUlfjbwN628G4F303wh+EPAfwTmA6tozvb35H2NZn9dWHLnlPYK4Os99tf2fru94/W4Fvh94A7gM7VjRXW/6/cNerBMNAeik7rETwLub81vrkxbgJ+18u7pWM8xZSf/DB0H4G7tMt/OO6zscBuA00tsW5ft/T7NAWpmlx24c/1/Blxc2l8CFrR2/k21nQI4Anhr2TF3tuJ30xTVGcDfUQ6SNGcM7YK0hX88CMxobyfNsNheHYxKfEodkOjzYFRyx/uAdH+3fjqX0Qwj3lKeR+f099328zL/UeB/0ezD7det/V75ccdj2u+VD5ff36+2X68u23pnj21or28rcHhpb+zI6/zg1V7nPwe+ADxanvOSPp9Le9n3O5Ztah0P7mvFf9Djd/KDVvs5mg8e7f11z/wzrbz2h7rDab4f8ec0Z/3fq/X1vH77STqYJprrAKPAuvKCXF12tFGe/ynwMeB0mjd4e5pHMxa/J+8WysG848VeDTzXit0GHL3nF9+KH0eXik3zBv4zmmGkH3dZ/mDrl76N8qmXpkh1vhGOA74M/LBsx8/LY/4HcFq3nbZLf0e32h8sj38IeD9wM/AnNEVhWSvvAzQH3j+hKc57CtUQ8N1WXl8Ho9bOP2UOSPR5MOrjubSX9XtA+jbwBzz/DGgWTSH9q1bsbmB+5ff3cMdzPqxj+btpzmge6rZ9wMdrr03H++QzNMOt3T5Qbacphh8u+220lrUPkpeV5/xmmjPKz9Gcwf8R8JXa77kVm0ZzfPlSK/Y3NGfmb6N5v1xQ4m/k+R8G/jfwhtJ+K7C+23uA5ozvbTz/GHIY8K9pronuiT0AvLyP38l9XZYvo3mv9Dey0k/SwTaVF3Uh8K/KtJAyLNLKuWbPL63L46/t2ElfVsl7fav9okrOCbQOPF2Wn0/r010fz+1o4OTKsmOB02g+dc/qsvyVe9HPL1M+0QLTaf7Y4pld8l5dlp3SY119HYxKfEodkOjzYFTi431AmkFz7eo+4AmaocCtJdYeYrsQeFXld3JBq/2fgN/skjNM64BEMxR2TJe8VwA3VPp5K80B9NEuy5Z1THuGZl8GrO7IfRPwdZqh1y3AWpq/IH1ER951fb5PTqM5U14HnFJ+zz8p++FvdOTdXl7nv97zetJ8oHp/K29e2b7HaYbOf1DaX6f1vgcupfUBsGObLmu1v0rrw3Er/nvAz/t6jv0kOTntz9RxMNrdcTCa0ZE7FQ5Ih7dy+joYldx+D0j/rOOA9MoSf94BqcROAX6z8zXqPLCUvLP3I++8/V0fzfW91xyg7et2IO13nf90L/L6ea3PormTcSbweuDfA2/psn1n8o/DmKfSfHjZ57zqftdvopPTgZgoQ1TjmTseeR0HpIH1O8jXhmYo8X7gL2iGNxe1lt25D3mXjXPeePfb1/r2YZ33jWPeMpoPKCM0N2bcTHNN6bvAR3vk3bI/eT33mX53QienAzHR5XrM/uZOtbx9XSd7d+felMmbBNvYz12S45rXazokv0ynySUiNtcW0Vyb2OvcqZZ3gNZ5WJYvh2bmg+W7LTdExEkld6rmTWTfz2bmc8DTEfHDLF9Kzcy/j4hfHMC8KouEBmEWcC7NGHlb0Fxk3ZfcqZZ3INb5WEScnpl3AWTmTyPit2m+nPmrUzhvIvt+JiKOzsynaW5AAZovCdLcpn2g8ur6Od1wctqfiT7vJNub3KmWd4D67vfOvSmVN8Hb2NddkuOd12vy/0lIkqoOub/dJEkaPxYJSVKVRUKSVGWRkCRVWSQkSVX/DxmI52cvCKbZAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['province'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:10.225283Z",
     "iopub.status.busy": "2021-07-20T13:23:10.224765Z",
     "iopub.status.idle": "2021-07-20T13:23:13.044562Z",
     "shell.execute_reply": "2021-07-20T13:23:13.043957Z",
     "shell.execute_reply.started": "2021-07-20T13:23:10.225238Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEBCAYAAABv4kJxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfCklEQVR4nO3df5Bd5X3f8fdXu5aQhNEPfqhCUlgZNraBmhqr/IjdhhgXBLgRnVAHmjFrR7HaMeRHpzNBpJ3SYneK00wApzYT1VIQHmqZIU5RirEiA54kkwpLAgcsfi4/hKQAwugXRpa0u/r2j+c5vs9e3Xv3PPfe3bs/Pq+ZO3vvc55zznPOPef5nvM8z7lr7o6IiEhZ0zpdABERmVgUOEREJIsCh4iIZFHgEBGRLAocIiKSpbvTBWi30047zXt6ejpdDBGRCWX79u0/cffTy+SddIGjp6eHbdu2dboYIiITipntLJtXTVUiIpJFgUNERLIocIiISBYFDhERyaLAISIiWRQ4REQkiwKHiIhkUeAQEZEsChwiIpJl0gWOZ/Yc7HQRREQmtUkXOEREZHQpcIiISBYFDhERyaLAISIiWRQ4REQkiwKHiIhkUeAQEZEsChwiIpJFgUNERLIocIiISBYFDhERyTJi4DCzdWa218x+nKT9DzN73syeNrO/MLO5ybRbzazfzF4wsyuT9OUxrd/MVifpS83siZj+bTObHtNnxM/9cXpPuzZaRESaV+aO415geVXaZuB8d/8I8CJwK4CZnQtcD5wX5/m6mXWZWRfwNeAq4FzghpgX4CvAne5+DrAfWBnTVwL7Y/qdMZ+IiHTYiIHD3f8a2FeV9lfuPhg/bgEWx/crgA3uftTdXwX6gYviq9/dX3H3Y8AGYIWZGfBJ4ME4/3rg2mRZ6+P7B4HLY34REemgdvRx/CbwSHy/CNiVTNsd0+qlnwocSIJQkT5sWXH6wZj/BGa2ysy2mdm2ocMH6Vn9cMsbJSIitbUUOMzsPwKDwP3tKU5z3H2Nuy9z92Vds+Z0sigiIpNed7MzmtnngE8Dl7u7x+Q9wJIk2+KYRp30d4C5ZtYd7yrS/MWydptZNzAn5hcRkQ5q6o7DzJYDvw/8qrsfTiZtBK6PI6KWAr3AD4GtQG8cQTWd0IG+MQacx4Hr4vx9wEPJsvri++uAx5IAJSIiHTLiHYeZfQu4DDjNzHYDtxFGUc0ANsf+6i3u/u/cfYeZPQA8S2jCusndh+JybgY2AV3AOnffEVdxC7DBzL4MPAWsjelrgW+aWT+hc/76NmyviIi0yCbbRfyMhb2+sO8uXrvjmk4XRURkwjCz7e6+rExePTkuIiJZFDhERCSLAoeIiGRR4BARkSyTNnDo6XERkdExaQMHKHiIiIyGSR04RESk/RQ4REQkiwKHiIhkUeAQEZEskz5wqINcRKS9Jn3gAAUPEZF2mhKBQ0RE2keBQ0REsihwiIhIFgUOERHJosAhIiJZFDhERCSLAoeIiGRR4BARkSwKHCIikkWBQ0REsihwiIhIlhEDh5mtM7O9ZvbjJG2+mW02s5fi33kx3czsq2bWb2ZPm9mFyTx9Mf9LZtaXpH/MzJ6J83zVzKzROkREpLPK3HHcCyyvSlsNPOruvcCj8TPAVUBvfK0C7oEQBIDbgIuBi4DbkkBwD/CFZL7lI6xDREQ6aMTA4e5/DeyrSl4BrI/v1wPXJun3ebAFmGtmC4Ergc3uvs/d9wObgeVx2inuvsXdHbivalm11iEiIh3UbB/HAnd/I75/E1gQ3y8CdiX5dse0Rum7a6Q3WscJzGyVmW0zs21Dhw82sTkiIlJWy53j8U7B21CWptfh7mvcfZm7L+uaNWc0iyIiMuU1Gzjeis1MxL97Y/oeYEmSb3FMa5S+uEZ6o3WIiEgHNRs4NgLFyKg+4KEk/cY4uuoS4GBsbtoEXGFm82Kn+BXApjjtkJldEkdT3Vi1rFrrEBGRDuoeKYOZfQu4DDjNzHYTRkfdATxgZiuBncBnYvbvAlcD/cBh4PMA7r7PzL4EbI35bnf3osP9i4SRWzOBR+KLBusQEZEOstB9MHnMWNjrC/vuOiH9tTuuGfvCiIhMEGa23d2Xlck7ZZ4c71n9cKeLICIyKUyZwCEiIu2hwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFDRESyKHCIiEgWBQ4REcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFDRESyKHCIiEgWBQ4REcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSpaXAYWb/3sx2mNmPzexbZnaSmS01syfMrN/Mvm1m02PeGfFzf5zekyzn1pj+gpldmaQvj2n9Zra6lbIC9Kx+uNVFiIhMeU0HDjNbBPwOsMzdzwe6gOuBrwB3uvs5wH5gZZxlJbA/pt8Z82Fm58b5zgOWA183sy4z6wK+BlwFnAvcEPOKiEgHtdpU1Q3MNLNuYBbwBvBJ4ME4fT1wbXy/In4mTr/czCymb3D3o+7+KtAPXBRf/e7+irsfAzbEvCIi0kFNBw533wP8EfA6IWAcBLYDB9x9MGbbDSyK7xcBu+K8gzH/qWl61Tz10k9gZqvMbJuZbRs6fLDZTRIRkRJaaaqaR7gDWAqcCcwmNDWNOXdf4+7L3H1Z16w5nSiCiMiU0UpT1aeAV939bXcfAL4DfByYG5uuABYDe+L7PcASgDh9DvBOml41T710ERHpoFYCx+vAJWY2K/ZVXA48CzwOXBfz9AEPxfcb42fi9Mfc3WP69XHU1VKgF/ghsBXojaO0phM60De2UF4REWmD7pGz1ObuT5jZg8CTwCDwFLAGeBjYYGZfjmlr4yxrgW+aWT+wjxAIcPcdZvYAIegMAje5+xCAmd0MbCKM2Frn7juaLa+IiLSHhYv+yWPGwl5f2HdX3emv3XHN2BVGRGSCMLPt7r6sTN4p9+S4HgIUEWnNlAscoOAhItKKKRk4RESkeQocIiKSRYFDRESyKHCIiEgWBQ4REcmiwCEiIlmmbODQkFwRkeZM2cAhIiLNUeAQEZEsChwiIpJlSgcO9XOIiOSb0oFDRETyKXCIiEgWBQ4REcky5QOH+jlERPJM+cAhIiJ5FDhERCSLAoeIiGRR4ED9HCIiORQ4REQkiwKHiIhkaSlwmNlcM3vQzJ43s+fM7FIzm29mm83spfh3XsxrZvZVM+s3s6fN7MJkOX0x/0tm1pekf8zMnonzfNXMrJXyiohI61q947gb+J67fwi4AHgOWA086u69wKPxM8BVQG98rQLuATCz+cBtwMXARcBtRbCJeb6QzLe8xfKKiEiLmg4cZjYH+OfAWgB3P+buB4AVwPqYbT1wbXy/ArjPgy3AXDNbCFwJbHb3fe6+H9gMLI/TTnH3Le7uwH3JskREpENaueNYCrwN/JmZPWVm3zCz2cACd38j5nkTWBDfLwJ2JfPvjmmN0nfXSD+Bma0ys21mtm3o8MGmNkYjq0REymklcHQDFwL3uPtHgfeoNEsBEO8UvIV1lOLua9x9mbsv65o1Z7RXJyIypbUSOHYDu939ifj5QUIgeSs2MxH/7o3T9wBLkvkXx7RG6YtrpIuISAc1HTjc/U1gl5l9MCZdDjwLbASKkVF9wEPx/Ubgxji66hLgYGzS2gRcYWbzYqf4FcCmOO2QmV0SR1PdmCxLREQ6pLvF+X8buN/MpgOvAJ8nBKMHzGwlsBP4TMz7XeBqoB84HPPi7vvM7EvA1pjvdnffF99/EbgXmAk8El8iItJBFrohJo8ZC3t9Yd9dTc//2h3XtK8wIiIThJltd/dlZfLqyXEREcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFDRESyKHCIiEgWBQ4REcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFDRESyKHCIiEgWBQ4REcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFDRESytBw4zKzLzJ4ys/8bPy81syfMrN/Mvm1m02P6jPi5P07vSZZxa0x/wcyuTNKXx7R+M1vdallFRKR17bjj+F3gueTzV4A73f0cYD+wMqavBPbH9DtjPszsXOB64DxgOfD1GIy6gK8BVwHnAjfEvCIi0kEtBQ4zWwxcA3wjfjbgk8CDMct64Nr4fkX8TJx+ecy/Atjg7kfd/VWgH7govvrd/RV3PwZsiHlFRKSDWr3juAv4feB4/HwqcMDdB+Pn3cCi+H4RsAsgTj8Y8/88vWqeeuknMLNVZrbNzLYNHT7Y4iaJiEgjTQcOM/s0sNfdt7exPE1x9zXuvszdl3XNmtPp4oiITGrdLcz7ceBXzexq4CTgFOBuYK6Zdce7isXAnph/D7AE2G1m3cAc4J0kvZDOUy9dREQ6pOk7Dne/1d0Xu3sPoXP7MXf/DeBx4LqYrQ94KL7fGD8Tpz/m7h7Tr4+jrpYCvcAPga1AbxylNT2uY2Oz5RURkfZo5Y6jnluADWb2ZeApYG1MXwt808z6gX2EQIC77zCzB4BngUHgJncfAjCzm4FNQBewzt13jEJ5RUQkg4WL/sljxsJeX9h3V9Pzv3bHNe0rjIjIBGFm2919WZm8enJcRESyKHCIiEgWBQ4REcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFDRESyKHCIiEgWBQ4REcmiwCEiIlkUOEREJIsCh4iIZFHgEBGRLAocIiKSRYFjgulZ/XCniyAiU5wCxwTUs/phBRAR6ZjR+J/jMkbS4KF/eSsiY0V3HJOE7kBEZKwocEwiCh4iMhbUVDXJVAcPNWGJSLspcExyCiQi0m5NBw4zWwLcBywAHFjj7neb2Xzg20AP8BrwGXffb2YG3A1cDRwGPufuT8Zl9QH/KS76y+6+PqZ/DLgXmAl8F/hdd/dmyyz1m7MUUESkrFb6OAaB/+Du5wKXADeZ2bnAauBRd+8FHo2fAa4CeuNrFXAPQAw0twEXAxcBt5nZvDjPPcAXkvmWt1BeaUD9IyJSVtN3HO7+BvBGfP+umT0HLAJWAJfFbOuBHwC3xPT74h3DFjOba2YLY97N7r4PwMw2A8vN7AfAKe6+JabfB1wLPNJsmaUxNWuJSBlt6eMwsx7go8ATwIIYVADeJDRlQQgqu5LZdse0Rum7a6TXWv8qwl0MXaec3sKWSKrMXYiCi8jU03LgMLOTgT8Hfs/dD4WujMDd3cxGvU/C3dcAawBmLOxVH8gY0l2KyNTT0nMcZvY+QtC4392/E5Pfik1QxL97Y/oeYEky++KY1ih9cY10ERHpoKYDRxwltRZ4zt3/OJm0EeiL7/uAh5L0Gy24BDgYm7Q2AVeY2bzYKX4FsClOO2Rml8R13ZgsS0REOqSVpqqPA58FnjGzH8W0PwDuAB4ws5XATuAzcdp3CUNx+wnDcT8P4O77zOxLwNaY7/aioxz4IpXhuI+gjnERkY5rZVTV3wJWZ/LlNfI7cFOdZa0D1tVI3wac32wZRUSk/fRbVSIikkWBQ0REsihwyJShf4Al0h76kUOZcsoGDz2TIlKb7jhE6tAdikhtuuMQGYF+UVhkOAUOkSaNdDeiwCKTlQKHyChRYJHJSoFDpEP0A5EyUalzXEREsihwiIhIFgUOERHJoj4OESkl7ZNRf8zUpsAhItna+WCkgtDEo8AhIh3VKAgpqIxPChwiMm4pqIxPChwiMiHpOZjO0agqERHJosAhIiJZFDhERCSL+jhERMbAZOqTUeAQEemAWiPGJkowUeAQERknJsrw43EfOMxsOXA30AV8w93v6HCRRETGXM7T+qMdZMZ14DCzLuBrwL8AdgNbzWyjuz/b2ZKJiIxfo/1PxMZ14AAuAvrd/RUAM9sArAAUOEREmtRqR725ezvL01Zmdh2w3N1/K37+LHCxu99clW8VsCp+PB94c0wLOrG9H3i304WYoLTvWqP915p277/Z7n56mYzj/Y6jFHdfA6wBMLNtHS7ORHM68GqnCzFBad+1RvuvNW3df+7eUzbveH8AcA+wJPm8OKaJiEiHjPfAsRXoNbOlZjYduB7Y2OEyiYhMaeO6qcrdB83sZmATYTjuOnffMcJsa0a/ZJPKPwP+ptOFmKC071qj/deaju2/cd05LiIi4894b6oSEZFxRoFDRESyKHCIiEiWcd05XoaZfYjwNPn5hOG6LwCPA/8UOAL8FrAXmAMMAT8FdgA3ufu+JtZ3hrvvNbNT3f2d9mxFe9Zbtmzt3oZO7RORicLM5gNU1zlmdqG7P5l8Ps3dfzLe1zuhO8fN7BbgBuB14FeAmYTRVw5Y8hfgeHztIzxZfhZwrbv/IC7rDOAo8F+BqwnPj0wHjgHPxc9zgPfF5ZIs2wkB6XvABwkBbG5cXxqchwhDjGcleQZjHkvyeZ311vJmnD8tm1XlGQSeaXIbBmP+dJkDwI9HWN57wB+6++11yi2jzMwWAIvixz3u/paZzSU8cfwxYBfwESrfYz/wFPAk4UJsGfCLhPPiE4Tj7G+A/5Pk6SGce/3AAmA+cBLhAu4v3P35Ous8Ka7z74Enmyhbut4PxvVOB15sUL6cfN2E4/pHwIYmy/cvgUvj+6OE8x7g/xHO718i1FkvAuuB34z75X3AF939z5vcdwPAfwY+TLh4fj+hXtxSZ72fjftkBtDn7o8ygokeOF4EziN8WdMIX87ZwCmESnqQsDPeJey84zHf8biIo4QKLq2Ya1W89RwlHGDTRpinWG8jR2KeY8DJdfIUZRtpeU7Y9mmEA6aRstswGKfXW15xIBVXLX8C/A7wv4A/ZPQrhLIn+mhXWMcIQfVz5FW67aqsfgX4GXAjoXKYSSWwHyOcDyMd42WO1zJ5iOucXmKdR9tYttHIV3Y7yiyv0TLSaWX3SZl8Q9Q/x98GbiIct/e7+4WNCg8TP3A8D1xJuLpx4CVClK1ugvtZTHPClz+SYqfsJVxFVRtg+B3AEUKFVGa51V/cLsLVfZH+M8IXPKPE8modpNVl2wucUWPesttQ3BEV/gE4s0a+1wnbUV2eWnd/tXTqRG93hVUm71hUVvX2e61lHY9pQ9Rvvq41X61lFxdlXSXyFWW2Ovnqla1evmkMP16bzZeWz5NyNlO+dDsPEi4CpsX86T4ajH+LFpN66yy77zxZxwPAr8f0dL2D8fU8MM3dLzCzH7n7P2EEE71z/PeARxnepDQUX6niqms6YacfoxIcUoepfAFGuN0brJG3Ky6n+HIeJ1TE1fmKL3mIcGdTK8/sON2T14sx7WiN5Q1RaXKrVbbijqrIu70N21Bsx3FCgKi1vMWE4ONV0+oFjWLdxQkzjdrfSVERNcpXbENacdWqcNN8xTLqBY1my0eNZVXvr1a2oWzZijtTgHV1ynYsSTPg6RrLKbwd/6bnVrqsI8lyuhh+LFWv06hc/b7cIF+9stXKV9jdhnyWvN5usXzpd3NK8j4NGkXQKZqsRyrbSPvuQPxcLO9STgzoDuyPec8DPmxmf0lorh7RhA4c7v49QnPA94E/JUTWywiVwQeBfw28BbwCHCJcfe8iPHH5NidWzFSl/c/490jMX5yo6YE1DbicyhdxlOFfZtFcNINQEfyESgUBoamjyOuEdtAPEA6SbTXW2xWXOS++f47wZaeVSHoCX0EIiK+W3IZ9VdtQNHNMi6+LCXdF/1C1vGmx7MUBvJ9KhTnaFULZE70TFRZUKtVjdfK0u7L6SVX6RZwY0GF4n9ggJ15wpYpfTS3u7qudROVKeIDw/UM4Vuqtc4Dw23OtlC09PgcIFWEr+aYn+Y4RjpMiUDdTPhh+ThQXdseT6U44RtL86TlYa52N9l3RVE8s21wqx1+xXovb+t9jnvcIdeYXGmxHpYATuamqLDObB/wZIajMJOykIcJJ8IuESrhoyioOkuLg/w7hpNlLqIjnAu8QAsFRwoEwHziH8IUdotIENEi4PX2P8EV9mHDFvieW45PAw4ST7khczj5Cv8CvE67iC6cz/G5oSSxLcdtbHJzVfRBHCLeij1Ppz5gP/DGhPfzDcf1nEdrJ35/MW/xsc9GsNZvKgT+Q7mLCQZi2pf8p4afuq5vd0gptkNBGv4wTK9s03wBhJNwFNfIVeYsTYxshuFW351Yv7+8IFxDWIF+7y7cT+IU2bcNIZfsp4fs7Qjgmi6va7wMfJ3SavxennRy34S1Cx+3ThO/63xAqk71xHUWH6lZCR+tiwj9Z+6W4bbOpHKc7gL8EniC0n380WeeRmLe4UHmJcIzOL1m2mYQLoksJg0MOEJpjnXBupeUrm6/Yjjfjvj0tbvNewkXX3xPOkbLlu5LQt/XTuK4fEc6lBYRWgKvj93NW3B+nxO9nJyFYQahnuqrWme67dwj9XLX23buEc/KVuL6dsayXxfUuieV9B5jh7jeQYUoEjkbM7POECu8MQiX6acKVcxe1T/CxUquJYqT8ZMyTNo0UdxM589dbZtFEUgSX+wgnRdkTLj2BD9DeE30sKqxphN9Lu59wUVKm0m1XZZWWbReh0nqdUGmsBU529531vrxOiSMahyVR467G3feWXN6pZfKN9tBxM+sGVgLXEr6/+QwfhThAqOAPULlD2EMYQLHO3QdogpnNAm6O6/gG4UL0E4QL5QcIjygsIfxDvH9FqPN+jXAu3O7uIzZXKXCYvZ58/EfpJIYHj3o7KqeirpX3OMPXUdxKF3c+xdV6ejJVn1g1T7RkeWUV66w3cuoolbup6m2pbpJ4Nf4dcPePZpRhzIxhhTWP2HQzFs+5mNkc4L8QRlbNo9wxWjS13kcYkv6tmN5FuJs+nUrnb9GnWN3smR6z6TFUdPgWin6/4u60mxAU02aX6lGOs+L6pldtT/XxnXu8H2J4s+P3gX9MGADSRQjS7XzeLd3G6rvdo7E8RwgXD5cS/nU2Sf4PES4mqgf5FNvghAuQnybvneHDgY8Tmq+LpuViGP4bwKnu/tmRNmJKBA4zexroJRzQzV5Rlwkc1XnSkRvVo5jSNtEuKidSN8P7WdLAUbRPFs1TRXr6Oe2MKzoop3HisyL1tqGRY9R/nqRoBss5yepVVlC7wipUd0QX23yM0Ow2neH7oTAeKqxOVFZFk2mx7MJhKv0rXVT6zwaoVDLpcdPJO/BUmUqr1oVaTvmrL+iqK/lU2lRcXHxMS6YVTY7FwBancpztI9wtGuHuY4hwx3gOof/y7GQ9s+N8Iw2dL8qclqf6IrI4h7qB1wjHwpPABe7+kRGWPWUCx1tUnlUoXkWFkw5Bre6wKqRNOSM5nLzvj+tdTKVCK6SV32g+wV90vO1keED638AtVDrO2lUpFE1gRSCrvntpVFnNTPKlDx1OpAqr+s4wR6uVVVqGdJlFsHyP4X1Y9S4Cyqq1P4oLnGl18qTTiqaYohzVF1PF4IpietGBXGboeyMDVOqDsooAUMtRhp/bxfdW/Z0MxL/dVLaluMhpVZkL2/ScLOqfdwkDO36+HHe/YKSVTZXAsZbaz2NA6LQrvEz94WgPJO8PUhk1kjqlav60jfJSYHl8fywuo5aX3f0P4OdtlWdVTZ9NuHvqoVIJDBCuXKAy+uUooZ37BUIH2DDu/qKZnUllv5wM/DLDO+QLJxGa8WYTDridhFFVqX8bpxWBIbU0ed9qZQUjnyS1po9VhVWvSXGAcs8QpXIrq0L11Xa9Ybvp0O1UcccxVGMalL/Qqe7zgsrIopOTMhbbMdLFVK2A7FXvGwXT4vPxuK7ib6r4tYduTrzLrCf9ntKAX93K0Oj5nXTacUK/x2yG74da30Wti9p0P6TvB6jcaR+h8gxHUY9MB95x90/UKePPTYnAIaPPzP6KEMyq27bnMLx5pFFlNVLFWtyllFE95HEsK6zqNE/+HmX0KqvCAJW7ubKc0E+1ldDe/SnCxchcKs1WhfmxzB+o2oa0ybQo4yHCBVVx8fJykucMwqitWUn+Qq0RZe9SGWhQGKBSQU+n3KCW43FZ/Qwflg1heOo9hL6ELk5sKWhGOgQ2/VtLcbf+MmEk5DnJtEXUvvCaw/CHfNPjIW3ehtAs+pG4nC8RRr49FJdxCDjfSwQFBQ5pizjk+TZCh+xcynfIFpXVIww/Sc7ixAqLuOwPVKXVGkM/xPA7rU5XWE44MV8iDJEc7crKCXe126jcBR8jjMQqax4h0L7B8GD3C4TRWu3QReVZpkaGqNxVTxS/TOi3KpqIFzC8T7J4insa4U59iLBfF8XPbzS53jOp3EUYlZGExL9F36sR9ukA8BjhnNvk7v9tpBUocMioi0Oex5v3UfunU6oNEoZItsMngL9t07Ia+RTjZ1i5DFf9UyM5P2XTiur1pP0dB+LrXuDX1Dku40LVkOep7ExO7BsaDc0MK69Wr8Pda0yTEzUasAAnNi/WG67fqlpNpukIzWOEJrti+vEyv1U14f8fh4wPbRryPBUs6XQBmjTSKC9prLoCP8Lwc6XsD6XmrutncblG5WdMZlD5+ZvnqdyJFP2PI1LgkHZZQGiXLX4evt6Q58mu0ZX9WF2pt2NYOTQOEgog5aTHQxp8XyM0g55N6NdwKiMoi+nVIyrLepnK8x9F5/85Vet14LcJ/Rv3JGXsK7MCNVVJW2QMeZ7s5lJ7SDIkT5CPsrLDyhuZR2VUVLVeQie/1DeP8GOTu+LndAj4+YTfwDub8GzNDuBUwuAICP0Nbyafc6XDgNP1Fss9Gzjg7s+Y2WmEn1qiSCuzAgUOERHJMha9+SIiMokocIiISBYFDhERyaLAISIiWf4/ZElnM8kNpK8AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['city'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:13.045580Z",
     "iopub.status.busy": "2021-07-20T13:23:13.045406Z",
     "iopub.status.idle": "2021-07-20T13:23:13.160810Z",
     "shell.execute_reply": "2021-07-20T13:23:13.160333Z",
     "shell.execute_reply.started": "2021-07-20T13:23:13.045565Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEACAYAAACj0I2EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQjUlEQVR4nO3df5BddX3G8fdDAlZFaadZHUoSl6lBzYg/V3CKU7D+aACHTFvrkFpbHTD/iLXVOqTVgRanM1g7tXYGtWlFqh2hYFsbSzR0lJZWxWYRBAIDRkAJtbIi6ihWSP30j3vQy7LJvUlOcne/vl8zO3vO93xz7pNN9snZc849SVUhSVr6Dpt0AElSPyx0SWqEhS5JjbDQJakRFrokNcJCl6RGTLTQk1yc5N4kN485/1VJbkmyI8lHDnY+SVpKMsn70JP8IvBd4ENV9cwRc9cAlwO/VFX3J3lSVd17KHJK0lIw0SP0qroG+ObwWJKfT/LJJNcl+Y8kT+82vR64qKru736tZS5JQxbjOfTNwBur6vnA7wPv7caPA45L8pkk1yZZN7GEkrQILZ90gGFJjgR+AbgiycPDj+k+LwfWAKcAK4FrkhxfVd86xDElaVFaVIXO4CeGb1XVcxbYtgv4fFU9BNyZ5HYGBb/9EOaTpEVrUZ1yqarvMCjrXwfIwLO7zR9jcHROkhUMTsHcMYGYkrQoTfq2xUuBzwFPS7IryVnAq4GzknwR2AGs76ZvA+5LcgtwNfDWqrpvErklaTGa6G2LkqT+LKpTLpKk/WehS1IjJnaXy4oVK2p6enpSLy9JS9J11133jaqaWmjbxAp9enqa2dnZSb28JC1JSb6yp22ecpGkRows9HGfiJjkBUl2J3llf/EkSeMa5wj9EmCvz01Jsgx4J3BVD5kkSfthZKEv9ETEBbwR+AfAJyBK0oQc8Dn0JMcAvwK878DjSJL2Vx8XRf8COLeqfjhqYpKNSWaTzM7NzfXw0pKkh/Vx2+IMcFn3uNsVwGlJdlfVx+ZPrKrNDJ53zszMjM8ckKQeHXChV9WxDy8nuQT4l4XKXJJ0cI0s9O6JiKcAK5LsAs4HDgeoqvcf1HSd6U1X9r7Puy48vfd9StIkjSz0qtow7s6q6rUHlEaStN98p6gkNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSI0YWepKLk9yb5OY9bH91khuT3JTks0me3X9MSdIo4xyhXwKs28v2O4GTq+p44B3A5h5ySZL20fJRE6rqmiTTe9n+2aHVa4GVPeSSJO2jvs+hnwV8oud9SpLGMPIIfVxJXsyg0F+0lzkbgY0Aq1ev7uulJUn0dISe5FnA3wDrq+q+Pc2rqs1VNVNVM1NTU328tCSpc8BH6ElWA/8IvKaqbj/wSEvX9KYre93fXRee3uv+JLVtZKEnuRQ4BViRZBdwPnA4QFW9HzgP+FngvUkAdlfVzMEKLEla2Dh3uWwYsf1s4OzeEkmS9ovvFJWkRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqxPJRE5JcDLwCuLeqnrnA9gDvAU4DHgBeW1Vf6Duo+jG96cre93nXhaf3vk9J+26cI/RLgHV72X4qsKb72Ai878BjSZL21chCr6prgG/uZcp64EM1cC3w00mO7iugJGk8fZxDPwa4e2h9VzcmSTqEDulF0SQbk8wmmZ2bmzuULy1JzRt5UXQM9wCrhtZXdmOPUlWbgc0AMzMz1cNrq1FevJX2XR9H6FuA38rAC4FvV9XXetivJGkfjHPb4qXAKcCKJLuA84HDAarq/cBWBrcs7mRw2+LrDlZYSdKejSz0qtowYnsBb+gtkSRpv/hOUUlqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGjFXoSdYluS3JziSbFti+OsnVSa5PcmOS0/qPKknam5GFnmQZcBFwKrAW2JBk7bxpbwcur6rnAmcC7+07qCRp78Y5Qj8B2FlVd1TVg8BlwPp5cwp4Yrd8FPDf/UWUJI1j+RhzjgHuHlrfBZw4b84fAVcleSPweOClvaSTJI2tr4uiG4BLqmolcBrw4SSP2neSjUlmk8zOzc319NKSJBiv0O8BVg2tr+zGhp0FXA5QVZ8DfgpYMX9HVbW5qmaqamZqamr/EkuSFjROoW8H1iQ5NskRDC56bpk356vASwCSPINBoXsILkmH0Mhz6FW1O8k5wDZgGXBxVe1IcgEwW1VbgLcAf53k9xhcIH1tVdXBDC4tBtObrux9n3ddeHrv+9RPhnEuilJVW4Gt88bOG1q+BTip32iSpH3hO0UlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGjFWoSdZl+S2JDuTbNrDnFcluSXJjiQf6TemJGmU5aMmJFkGXAS8DNgFbE+ypapuGZqzBvgD4KSquj/Jkw5WYEnSwsY5Qj8B2FlVd1TVg8BlwPp5c14PXFRV9wNU1b39xpQkjTJOoR8D3D20vqsbG3YccFySzyS5Nsm6vgJKksYz8pTLPuxnDXAKsBK4JsnxVfWt4UlJNgIbAVavXt3TS0uSYLwj9HuAVUPrK7uxYbuALVX1UFXdCdzOoOAfoao2V9VMVc1MTU3tb2ZJ0gLGKfTtwJokxyY5AjgT2DJvzscYHJ2TZAWDUzB39BdTkjTKyEKvqt3AOcA24Fbg8qrakeSCJGd007YB9yW5BbgaeGtV3XewQkuSHm2sc+hVtRXYOm/svKHlAt7cfUiSJsB3ikpSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjxvofiyQtbdObrux9n3ddeHrv+9SB8QhdkhphoUtSIyx0SWrEWIWeZF2S25LsTLJpL/N+LUklmekvoiRpHCMLPcky4CLgVGAtsCHJ2gXmPQF4E/D5vkNKkkYb5wj9BGBnVd1RVQ8ClwHrF5j3DuCdwP/2mE+SNKZxCv0Y4O6h9V3d2I8keR6wqqr6vzdKkjSWA74omuQw4M+Bt4wxd2OS2SSzc3NzB/rSkqQh4xT6PcCqofWV3djDngA8E/i3JHcBLwS2LHRhtKo2V9VMVc1MTU3tf2pJ0qOMU+jbgTVJjk1yBHAmsOXhjVX17apaUVXTVTUNXAucUVWzByWxJGlBIwu9qnYD5wDbgFuBy6tqR5ILkpxxsANKksYz1rNcqmorsHXe2Hl7mHvKgceSJO0r3ykqSY2w0CWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUCAtdkhphoUtSIyx0SWqEhS5JjbDQJakRFrokNcJCl6RGWOiS1AgLXZIaMVahJ1mX5LYkO5NsWmD7m5PckuTGJJ9K8pT+o0qS9mZkoSdZBlwEnAqsBTYkWTtv2vXATFU9C/go8Kd9B5Uk7d04R+gnADur6o6qehC4DFg/PKGqrq6qB7rVa4GV/caUJI0yTqEfA9w9tL6rG9uTs4BPLLQhycYks0lm5+bmxk8pSRqp14uiSX4TmAHetdD2qtpcVTNVNTM1NdXnS0vST7zlY8y5B1g1tL6yG3uEJC8F3gacXFU/6CeeJGlc4xyhbwfWJDk2yRHAmcCW4QlJngv8FXBGVd3bf0xJ0igjC72qdgPnANuAW4HLq2pHkguSnNFNexdwJHBFkhuSbNnD7iRJB8k4p1yoqq3A1nlj5w0tv7TnXJKkfeQ7RSWpERa6JDXCQpekRljoktQIC12SGmGhS1IjLHRJaoSFLkmNsNAlqREWuiQ1wkKXpEZY6JLUiLEeziVJh8L0pit73+ddF57e+z4XK4/QJakRFrokNcJCl6RGWOiS1AgLXZIaYaFLUiMsdElqhIUuSY0Yq9CTrEtyW5KdSTYtsP0xSf6+2/75JNO9J5Uk7dXIQk+yDLgIOBVYC2xIsnbetLOA+6vqqcC7gXf2HVSStHfjHKGfAOysqjuq6kHgMmD9vDnrgb/tlj8KvCRJ+ospSRolVbX3CckrgXVVdXa3/hrgxKo6Z2jOzd2cXd36l7s535i3r43Axm71acBtff1GOiuAb4ycNXnm7Jc5+7MUMsJPds6nVNXUQhsO6cO5qmozsPlg7T/JbFXNHKz998Wc/TJnf5ZCRjDnnoxzyuUeYNXQ+spubME5SZYDRwH39RFQkjSecQp9O7AmybFJjgDOBLbMm7MF+O1u+ZXAp2vUuRxJUq9GnnKpqt1JzgG2AcuAi6tqR5ILgNmq2gJ8APhwkp3ANxmU/iQctNM5PTNnv8zZn6WQEcy5oJEXRSVJS4PvFJWkRljoktQIC12SGrGkCz3J05Ocm+Qvu49zkzxj0rmWqu7r+ZIkR84bXzepTPMlOSHJC7rltUnenOS0SecaJcmHJp1hlCQv6r6eL590lmFJTkzyxG75sUn+OMnHk7wzyVGTzvewJL+TZNXomQcxw1K9KJrkXGADg0cR7OqGVzK4w+ayqrpwUtnGleR1VfXBSeeAwV9G4A3ArcBzgDdV1T93275QVc+bYDy6HOczeKbQcuBfgROBq4GXAduq6k8mGO9Hksy/rTfAi4FPA1TVGYc81AKS/FdVndAtv57Bn/8/AS8HPr5YvoeS7ACe3d1xtxl4gO4RI934r040YCfJt4HvAV8GLgWuqKq5QxqiqpbkB3A7cPgC40cAX5p0vjF/D1+ddIahLDcBR3bL08Asg1IHuH7S+YYyLgMeB3wHeGI3/ljgxknnG8r5BeDvgFOAk7vPX+uWT550vqGc1w8tbwemuuXHAzdNOt9QtluHv7bztt0w6XzDX08GZz1ezuBW7jngkwzeo/OEQ5HhkL71v2c/BH4O+Mq88aO7bYtCkhv3tAl48qHMMsJhVfVdgKq6K8kpwEeTPIVB1sVgd1X9H/BAki9X1XcAqur7SRbNnzkwA7wJeBvw1qq6Icn3q+rfJ5xrvsOS/AyDEkp1R5NV9b0kuycb7RFuHvpp9otJZqpqNslxwEOTDjekquqHwFXAVUkOZ/AT5Qbgz4AFn7/Sp6Vc6L8LfCrJl4C7u7HVwFOBc/b0iybgycAvA/fPGw/w2UMfZ4++nuQ5VXUDQFV9N8krgIuB4yea7MceTPK4qnoAeP7Dg9151EVT6N039buTXNF9/jqL83vtKOA6Bn8XK8nRVfW17hrKYvlHHOBs4D1J3s7gQVefS3I3g+/7syea7JEe8TWrqocYvIt+S5LHHZIA3Y8KS1KSwxg83veYbugeYHt3FLcoJPkA8MGq+s8Ftn2kqn5jArEeJclKBkfA/7PAtpOq6jMTiDU/x2Oq6gcLjK8Ajq6qmyYQa6QkpwMnVdUfTjrLOLryeXJV3TnpLMO6C6PHMvjHcVdVfX3CkR4hyXFVdftEMyzlQpck/diSvm1RkvRjFrokNcJCl6RGWOiS1AgLXZIa8f8IURB/xQS4qAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['city_level'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:23:13.161933Z",
     "iopub.status.busy": "2021-07-20T13:23:13.161757Z",
     "iopub.status.idle": "2021-07-20T13:23:29.105894Z",
     "shell.execute_reply": "2021-07-20T13:23:29.105382Z",
     "shell.execute_reply.started": "2021-07-20T13:23:13.161917Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEICAYAAABI7RO5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAdn0lEQVR4nO3df5Ac5X3n8fdHu5b4KSShBduSiHS24pzgnBhvQInvUg7YsDgui8oRn0jK6HKcdReDk7tKxRaXu6LKdqqwkzIXqoAqxSgWPs6CI/ahC9iKDnC4XJ2AlXHAAivagLEkA1qQEFgYSSt974/+jrcZze5Oz4w0u6vPq2pqup9+uvvp3tn5THc/06OIwMzMrFkzut0AMzObWhwcZmZWiYPDzMwqcXCYmVklDg4zM6vEwWFmZpX0drsBnTZ//vxYvHhxt5thZjalbN269eWI6Gum7rQLjsWLFzM4ONjtZpiZTSmSnm+2rk9VmZlZJQ4OMzOrxMFhZmaVODjMzKwSB4eZmVXi4DAzs0ocHGZmVsmEwSFpnaQ9kr5fV/5pST+QtE3Sl0rlN0gakrRd0uWl8oEsG5K0plS+RNKjWX63pJlZPivHh3L64o5ssZmZtaWZI46vAgPlAkm/DqwAfjEizgf+LMuXASuB83Oe2yT1SOoBbgWuAJYBV2ddgC8CN0fEu4F9wLVZfi2wL8tvznpmZtZlEwZHRDwC7K0r/j3gpog4mHX2ZPkKYENEHIyI54Ah4KJ8DEXEsxFxCNgArJAk4BLg3px/PXBlaVnrc/he4NKsb2ZmXdTqNY6fB/5FnkL6W0m/nOULgJ2leruybKzys4FXI2Kkrvwty8rp+7P+MSStljQoaXB4eLjFTTIzs2a0Ghy9wDxgOfBHwD3dPBqIiLUR0R8R/X19Td2jy8zMWtRqcOwCvhGFx4CjwHxgN7CoVG9hlo1V/gowR1JvXTnleXL6WVnfzMy6qNXg+J/ArwNI+nlgJvAysBFYmT2ilgBLgceAx4Gl2YNqJsUF9I0REcDDwFW53FXAfTm8McfJ6Q9lfTMz66IJb6su6evAB4H5knYBNwLrgHXZRfcQsCrf1LdJugd4GhgBrouII7mc64FNQA+wLiK25So+C2yQ9AXgCeCOLL8D+JqkIYqL8ys7sL1mZtYmTbcP8f39/eHf4zAzq0bS1ojob6auvzluZmaVODjMzKwSB4eZmVXi4DAzs0ocHGZmVomDw8zMKnFwmJlZJQ4OMzOrxMFhZmaVODjMzKwSB4eZmVXi4DAzs0ocHGZmVomDw8zMKnFwmJlZJQ4OMzOrZMLgkLRO0p78tb/6aX8oKSTNz3FJukXSkKQnJV1YqrtK0o58rCqVv1/SUznPLZKU5fMkbc76myXN7cwmm5lZO5o54vgqMFBfKGkRcBnwo1LxFRS/M74UWA3cnnXnUfzk7MXARcCNpSC4Hfhkab7autYAD0bEUuDBHDczsy6bMDgi4hGK3/yudzPwGaD827MrgDujsAWYI+kdwOXA5ojYGxH7gM3AQE6bHRFb8jfL7wSuLC1rfQ6vL5WbmVkXtXSNQ9IKYHdE/H3dpAXAztL4riwbr3xXg3KAcyPihRx+ETi3lbaamVln9VadQdJpwH+iOE11QkRESIqxpktaTXFqjPPOO+9ENcvM7KTUyhHHu4AlwN9L+iGwEPiupLcDu4FFpboLs2y88oUNygFeylNZ5POesRoUEWsjoj8i+vv6+lrYJDMza1bl4IiIpyLinIhYHBGLKU4vXRgRLwIbgWuyd9VyYH+ebtoEXCZpbl4UvwzYlNNek7Q8e1NdA9yXq9oI1HpfrSqVm5lZFzXTHffrwP8D3iNpl6Rrx6n+APAsMAT8BfApgIjYC3weeDwfn8syss5Xcp5/BL6V5TcBH5a0A/hQjpuZWZep6Mw0ffT398fg4GC3m2FmNqVI2hoR/c3U9TfHzcysEgeHmZlV4uAwM7NKHBxmZlaJg8PMzCpxcJiZWSUODjMzq8TBYWZmlTg4zMysEgeHmZlV4uAwM7NKHBxmZlaJg8PMzCpxcJiZWSUODjMzq8TBYWZmlTg4zMyskmZ+OnadpD2Svl8q+1NJP5D0pKRvSppTmnaDpCFJ2yVdXiofyLIhSWtK5UskPZrld0uameWzcnwopy/u1EabmVnrmjni+CowUFe2GbggIt4L/ANwA4CkZcBK4Pyc5zZJPZJ6gFuBK4BlwNVZF+CLwM0R8W5gH1D7TfNrgX1ZfnPWMzOzLpswOCLiEWBvXdnfRMRIjm4BFubwCmBDRByMiOeAIeCifAxFxLMRcQjYAKyQJOAS4N6cfz1wZWlZ63P4XuDSrG9mZl3UiWsc/wb4Vg4vAHaWpu3KsrHKzwZeLYVQrfwty8rp+7O+mZl1UVvBIemPgRHgrs40p+V2rJY0KGlweHi4m00xM5v2Wg4OSf8a+CjwOxERWbwbWFSqtjDLxip/BZgjqbeu/C3LyulnZf1jRMTaiOiPiP6+vr5WN8nMzJrQUnBIGgA+A3wsIt4oTdoIrMweUUuApcBjwOPA0uxBNZPiAvrGDJyHgaty/lXAfaVlrcrhq4CHSgFlZmZd0jtRBUlfBz4IzJe0C7iRohfVLGBzXq/eEhH/PiK2SboHeJriFNZ1EXEkl3M9sAnoAdZFxLZcxWeBDZK+ADwB3JHldwBfkzREcXF+ZQe218zM2qTp9iG+v78/BgcHu90MM7MpRdLWiOhvpq6/OW5mZpU4OMzMrBIHh5mZVeLgMDOzShwcZmZWiYPDzMwqcXCYmVklDg4zM6vEwWFmZpU4OMzMrBIHh5mZVeLgMDOzShwcZmZWiYPDzMwqcXCYmVklDg4zM6vEwWFmZpVMGByS1knaI+n7pbJ5kjZL2pHPc7Nckm6RNCTpSUkXluZZlfV3SFpVKn+/pKdynluUv0U71jrMzKy7mjni+CowUFe2BngwIpYCD+Y4wBXA0nysBm6HIgQofqv8YuAi4MZSENwOfLI038AE6zAzsy6aMDgi4hFgb13xCmB9Dq8HriyV3xmFLcAcSe8ALgc2R8TeiNgHbAYGctrsiNgSxY+f31m3rEbrMDOzLmr1Gse5EfFCDr8InJvDC4CdpXq7smy88l0NysdbxzEkrZY0KGlweHi4hc0xM7NmtX1xPI8UogNtaXkdEbE2Ivojor+vr+94NsXM7KTXanC8lKeZyOc9Wb4bWFSqtzDLxitf2KB8vHWYmVkXtRocG4Faz6hVwH2l8muyd9VyYH+ebtoEXCZpbl4UvwzYlNNek7Q8e1NdU7esRuswM7Mu6p2ogqSvAx8E5kvaRdE76ibgHknXAs8DH8/qDwAfAYaAN4DfBYiIvZI+Dzye9T4XEbUL7p+i6Ll1KvCtfDDOOszMrItUXD6YPvr7+2NwcLDbzTAzm1IkbY2I/mbq+pvjZmZWiYPDzMwqcXCYmVklDg4zM6vEwWFmZpU4OMzMrBIHh5mZVeLgMDOzShwcZmZWiYPDzMwqcXCYmVklDg4zM6vEwWFmZpU4OMzMrBIHh5mZVeLgMDOzStoKDkn/UdI2Sd+X9HVJp0haIulRSUOS7pY0M+vOyvGhnL64tJwbsny7pMtL5QNZNiRpTTttNTOzzmg5OCQtAH4f6I+IC4AeYCXwReDmiHg3sA+4Nme5FtiX5TdnPSQty/nOBwaA2yT1SOoBbgWuAJYBV2ddMzPronZPVfUCp0rqBU4DXgAuAe7N6euBK3N4RY6T0y+VpCzfEBEHI+I5it8rvygfQxHxbEQcAjZkXTMz66KWgyMidgN/BvyIIjD2A1uBVyNiJKvtAhbk8AJgZ847kvXPLpfXzTNWuZmZdVE7p6rmUhwBLAHeCZxOcarphJO0WtKgpMHh4eFuNMHM7KTRzqmqDwHPRcRwRBwGvgF8AJiTp64AFgK7c3g3sAggp58FvFIur5tnrPJjRMTaiOiPiP6+vr42NsnMzCbSTnD8CFgu6bS8VnEp8DTwMHBV1lkF3JfDG3OcnP5QRESWr8xeV0uApcBjwOPA0uylNZPiAvrGNtprZmYd0DtxlcYi4lFJ9wLfBUaAJ4C1wP3ABklfyLI7cpY7gK9JGgL2UgQBEbFN0j0UoTMCXBcRRwAkXQ9souixtS4itrXaXjMz6wwVH/qnj/7+/hgcHOx2M8zMphRJWyOiv5m60/Kb44vX3N/tJpiZTVvTMjjMzOz4cXCYmVklDg4zM6vEwWFmZpU4OMzMrBIHh5mZVeLgMDOzShwcZmZWiYPDzMwqcXCYmVklDg4zM6vEwWFmZpU4OMzMrBIHh5mZVeLgMDOzShwcZmZWSVvBIWmOpHsl/UDSM5J+RdI8SZsl7cjnuVlXkm6RNCTpSUkXlpazKuvvkLSqVP5+SU/lPLfkb5ubmVkXtXvE8efAtyPiF4BfBJ4B1gAPRsRS4MEcB7gCWJqP1cDtAJLmATcCFwMXATfWwibrfLI030Cb7TUzsza1HBySzgJ+DbgDICIORcSrwApgfVZbD1yZwyuAO6OwBZgj6R3A5cDmiNgbEfuAzcBATpsdEVui+GH0O0vLMjOzLmnniGMJMAz8paQnJH1F0unAuRHxQtZ5ETg3hxcAO0vz78qy8cp3NSg/hqTVkgYlDQ4PD7exSWZmNpF2gqMXuBC4PSLeBxxg9LQUAHmkEG2soykRsTYi+iOiv6+v72fli9fcf7xXbWZ20mknOHYBuyLi0Ry/lyJIXsrTTOTznpy+G1hUmn9hlo1XvrBBuZmZdVHLwRERLwI7Jb0niy4FngY2ArWeUauA+3J4I3BN9q5aDuzPU1qbgMskzc2L4pcBm3Laa5KWZ2+qa0rLMjOzLultc/5PA3dJmgk8C/wuRRjdI+la4Hng41n3AeAjwBDwRtYlIvZK+jzweNb7XETszeFPAV8FTgW+lQ8zM+uitoIjIr4H9DeYdGmDugFcN8Zy1gHrGpQPAhe000YzM+ssf3PczMwqcXCYmVklDg4zM6vEwWFmZpU4OMzMrJKTIjj8DXIzs845KYLDzMw6x8FhZmaVODjMzKwSB4eZmVVy0gSHL5CbmXXGSRMcZmbWGQ4OMzOrxMFhZmaVODjMzKwSB4eZmVXSdnBI6pH0hKS/zvElkh6VNCTp7vx1QCTNyvGhnL64tIwbsny7pMtL5QNZNiRpTbttNTOz9nXiiOMPgGdK418Ebo6IdwP7gGuz/FpgX5bfnPWQtAxYCZwPDAC3ZRj1ALcCVwDLgKuzrpmZdVFbwSFpIfAbwFdyXMAlwL1ZZT1wZQ6vyHFy+qVZfwWwISIORsRzFL9JflE+hiLi2Yg4BGzIumZm1kXtHnH8V+AzwNEcPxt4NSJGcnwXsCCHFwA7AXL6/qz/s/K6ecYqNzOzLmo5OCR9FNgTEVs72J5W27Ja0qCkweHh4W43x8xsWmvniOMDwMck/ZDiNNIlwJ8DcyT1Zp2FwO4c3g0sAsjpZwGvlMvr5hmr/BgRsTYi+iOiv6+vr41NMjOzibQcHBFxQ0QsjIjFFBe3H4qI3wEeBq7KaquA+3J4Y46T0x+KiMjyldnragmwFHgMeBxYmr20ZuY6Nrba3hrfs8rMrD29E1ep7LPABklfAJ4A7sjyO4CvSRoC9lIEARGxTdI9wNPACHBdRBwBkHQ9sAnoAdZFxLbj0F4zM6ugI8EREd8BvpPDz1L0iKqv8ybwW2PM/yfAnzQofwB4oBNtNDOzzvA3x83MrBIHh5mZVeLgMDOzShwcZmZWyUkbHO6Wa2bWmpM2OMzMrDUODjMzq8TBYWZmlTg4zMysEgeHmZlVctIHh3tXmZlVc9IHh5mZVePgMDOzShwcZmZWiYPDzMwqcXAkXyQ3M2uOg8PMzCppOTgkLZL0sKSnJW2T9AdZPk/SZkk78nlulkvSLZKGJD0p6cLSslZl/R2SVpXK3y/pqZznFklqZ2PNzKx97RxxjAB/GBHLgOXAdZKWAWuAByNiKfBgjgNcASzNx2rgdiiCBrgRuJjiJ2dvrIVN1vlkab6BNtprZmYd0HJwRMQLEfHdHH4deAZYAKwA1me19cCVObwCuDMKW4A5kt4BXA5sjoi9EbEP2AwM5LTZEbElIgK4s7Ss48bXOszMxteRaxySFgPvAx4Fzo2IF3LSi8C5ObwA2FmabVeWjVe+q0H5cefwMDMbW9vBIekM4K+A/xARr5Wn5ZFCtLuOJtqwWtKgpMHh4eHjvTozs5NaW8Eh6W0UoXFXRHwji1/K00zk854s3w0sKs2+MMvGK1/YoPwYEbE2Ivojor+vr6+dTTIzswm006tKwB3AMxHx5dKkjUCtZ9Qq4L5S+TXZu2o5sD9PaW0CLpM0Ny+KXwZsymmvSVqe67qmtCwzM+uSdo44PgB8ArhE0vfy8RHgJuDDknYAH8pxgAeAZ4Eh4C+ATwFExF7g88Dj+fhclpF1vpLz/CPwrTbaW5mvdZiZHau31Rkj4u+Asb5XcWmD+gFcN8ay1gHrGpQPAhe02kYzM+s8f3O8CT7yMDMb5eAwM7NKHBwV+MjDzMzBYWZmFTk4zMysEgdHC3zKysxOZg4OMzOrxMFhZmaVODja4FNWZnYycnC0yeFhZicbB0eHOEDM7GTh4OgwB4iZTXcOjuPEAWJm05WD4zhzgJjZdOPgOEEWr7nfIWJm04KDowtqAeIgMbOpyMExCThAzGwqmfTBIWlA0nZJQ5LWdLs9x1M5QBwmZjZZTergkNQD3ApcASwDrpa0rLutOrF8WsvMJptJHRzARcBQRDwbEYeADcCKLrepqxoFicPFzE4kRUS32zAmSVcBAxHxb3P8E8DFEXF9Xb3VwOocfR9wlCIUm3nGdZuuO1Xa6bquO1nWPZXqvhERZ9KE3mYqTXYRsRZYCyDpCKPbNaPJZ9dtvu5Uaafruu5kWfdUqbudJs2YuEpX7QYWlcYXZpmZmXXJZA+Ox4GlkpZImgmsBDZ2uU1mZie1SX2qKiJGJF0PbAJ6gHURsW2C2R4H3gWcArzZxDOu23TdqdJO13XdybLuqVR3LU2a1BfHzcxs8pnsp6rMzGyScXCYmVklDg4zM6tkUl8cb4akh4F3AiMU2/Mu4Aiwi+Kb5z3AkYh4pW6+90TEdknvAfaV69SmncDNGJekeQARsVfSJRHxUK0svRf4PnAe8KOIeLnBfBcCrwF7qy6nvCxgcf206ax+uwEi4uUs/wDwOqP77Gf7twtNNTthpvTFcUmvA2ccx1W8CfxqRDxRZSZJS4H5wEeBnwN+CZgNzAUOAG8Dflpax+nAmcBBRns5nEERhDOA2h9JTaz+CKNHkmL0W6FVHcn1iiJ8j+ajN8sPA8MUoXsacE62/23Ay7lNOym+e3MWxfb25vQZwI5czrx8HKYI+zkU+6C27T+l+O7O6RT77/RcTy/Ffjqa638513OU4kPEGTl9mGKf/1PgUC77DWArcAGwH1iQ2zkjl/0KxfeHZuW+qLoP38hH7W9dW+cp2bYzc//OzP13etZ9CTgVeBT4JvAe4N35uKC0f0eAzdmuz5wsIW6Tx1QPjlrjrwHuPE6rGaF4A9pP8Q/+GsU/7ysUb1S1N71TKP6xZ9PcG7xZJ9VCvvZcLodqr8n6ZVSZr34Z0+F/Yax928q8jf4+Yy27XNbq36RZtQ+GX4qIP56o8nQJDjMz64y/iYjLx6sw1YPjKNPjE42Z2WQQwMGIOHW8SlO9V9X/AD7FWw+RzcysNbXTZuNXmspHHDWS7gc+TLHR9T3Faufuehj7fOHxPn/YrMnSjqqaafdE+/xEnovv1Pz1y5hoeeXz2XRg3Wadthc4NSJOG6/StAgOAEm/D3yZIiC6rfYGUeuNsx14GLgpIp4vV5T0UeC8iLhN0tnAr1FcgP874C+B32S0dw8UF+vfAF6k6AbaD7yd0cD8EfAM8CsUvYheAs7n2KPLiZbzLPBfKO5fMyPb8gmKDgEBbAF+OyJ+ONZOyG2bQ3HOdE/+nsqPI+LBseYZYznLKLpZ13qaPRMRg1WWUbe8AeAnwOKI+G9Z1g8siohvSprF2Pt+BvA8Re+nc3jrfj1K0UHiRYpeW3803v4prfffAY9QdLj4EkVPtFoPtkPAC8BdFN2lvwmsovg7nAP05TpnUHTemEXx2jtM0Ymj1u5eiq7Db2adNyn+xrNz3rdT7N83KHqynUbR0+v0XP+ZWT6b0Z5xBxl9PSjLD+Uyeik6jLye++ptFB1MzgAeAn4MXEzRzflAqc1PU3RtrnWDfrk0/3C2vZfi7/cPuQxRvM5+yOhruIfRnoEvZvtHsn1nM9r78PVc1wyKHm7nZDuO5rQzKHq/zcp1ziq1aV+u60hOW5D7Tbmu03K5ym2cmds0M/cduV21DxC138w4kOvpybq1noIjjPZoPMJo78vDWf7T3OdP5n6pvf/05PTyPtkDvErRk3Fhbst/z306OyK+zTimU3A8RdFlsR2NPjH+tFH6lt/wc/wTFP8Y/wq4usFyzI6HoHjTuosi5D9NEbIbKcLvVym6g/82xZv+qRRvMC8DSxkNgdqRUFC88f846zxFcTT/c4x2Df8JRcidR/Fm9kTOswC4n6Ib+lGKDz/bKboVf5zizbu2noPZjgMU3a2PAP+M0Tf0Hoo3ttqb9Z9SfHB5P6MhM5O3dj0/TPGhaxajb+CUpr8G3A4MUATMKRTBqFxvLTAPU4TQSC5rFkVAvk4RPKfmNtxF8Ya7KLd5Uc6zhyKAjuT+2A58s9atX9IzjH442ELxYe9C4O78e3ws91ktVI5m2/93tuGXGA3cJbkv91GE5wxgMOc/O7f9eeDbFL+e2gs8R/Hh8r3AP8n1vAF8B7g6mgiFaREcvkg+ZU3VU3NmrRrvNT8Z/h8OAj+MiF8Yr9JUvzheM9bO/s6JbIRV1u1/ErMTbbzX/GT4f5hFcTQzril/y5G0n9FDyf8D3Af8VUS8IuknjH7buHbot4fiMPJ0ikPLwxSHrmZmNoFpccQREXMiQhExOyJ+IyLW1u47FRFn5LRTIqInImZExNsjYkHONz8i3nECm3s3nek+/FgHlgGwpkPLGQE+x+h58nbsYPRC4XTSiX0zlXRiW6fD/ppqr+UjE1WYFsHRIb/F2H/gEeBGWvvHj5z/KMWRzW8yepRzOIePlMZrj4lebBeNs74qbqpYv+xIPmq9N/4zo/voaOlRP17/qLeU9l6btfWNUBxpHq57PpTTj5TK/hfFEevhNtY7keN1C47yPq/9/UcoLmA32r/1dRupTW/m9fSTMdYzSHFRdyy1duyh6AZav67DFBeOaz2GosGjfnmHG5TXHCi1szx/o+eDFPuwkYOMblttfbXXW/1+eBn4W4qLz43aW3+blvHUXreN7Bxj/lrb6tW/x4xQXKj/EEXvqnFNi4vjx5ukVyleuKeWnmtd5faWyk9jtOeI2Ylya0Rc3+1GnAilLutnUARW7fk14HvAR3Kc+jrZ1frsUp0zSnUfqb+Ddofa+88peqTNzjbOznWeSXGKvTYuYDAinpZ0TkTsKS3j94CdEfHXOb4sIp5ucv3nAP+Stwb47Ii4va3tcnBMzPfEsmmk1n32l7vYhnZuGjiZ1X7O4QBFKFxAERBTyQHgYxHx0HiVHBzJ4WBmBsC+iJg3XgVf4zAr+IODWeGUiSpMl+64ndDJw+ax7sF/gOJaSO0WELWLb72lskMU1072U1xQe5niB4z2Zd0+ilsKnEVxq4SZudyDFN/2rXU9XpT1e/JxIB+LKL6hW/+hoXwRu6fB9JrXGT38/nFELMrbZlzPsdeBas+v57bVT3snxe1QzmH09hi1WzHUX9CbwehF+NrtOGYw+m3mNynOFT9HcVH4udJ+q3+mQdmXgeuAZbn9i4D6L0Ed4dhb2tQuptfKa12+a/tvrIu45WmzONZI3XJgep3Wma7afR85SOPXw6TiU1VmXSbpAxS3hJjo1yxr98pq50xBrfdPeTkn+szDZL22Ufvw1qz6m1Y+T9G7aYTiA9F8qu/bF4Cxvh5Qvjlm/fB4PxY1Xtsb+XZEXDHezD7iMOuyiPi/jHERVdL20ujsrH/MG0uDnn/Q+Oiv0bPrdqbuXIrTPLXx1xm9Y3f5rEKj5x6KMwzK+Q6UljOP4kxE7X5aZ2adZtp7OqM/91y7yWEvb+0NWv/8YybgIw6zSaxRp42IOOaTojt3WAcdiYhxDyocHGZd5jd9m2wafTgpc68qs8npJ4xeyDc7kSa8RYqDw6z7areSKN8GZEv2pR8pPcw6rdHtVnZONJNPVZmZWSU+4jAzs0ocHGZmVomDw8zMKnFwmJlZJQ4OMzOr5P8Dt+310L7a5NEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_features['device_name'].value_counts().plot(kind='bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 视频特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:24:01.649356Z",
     "iopub.status.busy": "2021-07-20T13:24:01.648787Z",
     "iopub.status.idle": "2021-07-20T13:24:01.662032Z",
     "shell.execute_reply": "2021-07-20T13:24:01.661440Z",
     "shell.execute_reply.started": "2021-07-20T13:24:01.649310Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>video_id</th>\n",
       "      <th>video_name</th>\n",
       "      <th>video_tags</th>\n",
       "      <th>video_description</th>\n",
       "      <th>video_release_date</th>\n",
       "      <th>video_director_list</th>\n",
       "      <th>video_actor_list</th>\n",
       "      <th>video_score</th>\n",
       "      <th>video_second_class</th>\n",
       "      <th>video_duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3460</td>\n",
       "      <td>脱皮爸爸</td>\n",
       "      <td>院线电影,家庭关系,命运</td>\n",
       "      <td>中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...</td>\n",
       "      <td>2017-04-27</td>\n",
       "      <td>司徒慧焯</td>\n",
       "      <td>吴镇宇,古天乐,春夏,蔡洁</td>\n",
       "      <td>7.398438</td>\n",
       "      <td>剧情,喜剧,奇幻</td>\n",
       "      <td>5913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>14553</td>\n",
       "      <td>喜气洋洋小金莲</td>\n",
       "      <td>古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷</td>\n",
       "      <td>故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>杨珊珊,李亚玲</td>\n",
       "      <td>陈南飞,程隆妮,王闯,贾海涛,闫薇儿</td>\n",
       "      <td>5.601562</td>\n",
       "      <td>喜剧</td>\n",
       "      <td>6217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1214</td>\n",
       "      <td>风流家族</td>\n",
       "      <td>男女关系,家庭关系,命运,院线电影</td>\n",
       "      <td>香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...</td>\n",
       "      <td>2002-03-07</td>\n",
       "      <td>邱礼涛,杨漪珊</td>\n",
       "      <td>张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...</td>\n",
       "      <td>6.800781</td>\n",
       "      <td>都市,喜剧,爱情,家庭</td>\n",
       "      <td>5963</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30639</td>\n",
       "      <td>大提琴的故事</td>\n",
       "      <td>短片,动画片</td>\n",
       "      <td>低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...</td>\n",
       "      <td>1949-01-01</td>\n",
       "      <td>伊里·特恩卡,契诃夫</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>动画,爱情</td>\n",
       "      <td>17371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>38522</td>\n",
       "      <td>歌舞大王齐格飞</td>\n",
       "      <td>喜剧片,人物传记,浪漫爱情</td>\n",
       "      <td>罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...</td>\n",
       "      <td>1936-04-08</td>\n",
       "      <td>罗伯特·Z·伦纳德,William Anthony McGuire</td>\n",
       "      <td>威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...</td>\n",
       "      <td>7.699219</td>\n",
       "      <td>剧情,歌舞,喜剧</td>\n",
       "      <td>10608</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   video_id video_name                   video_tags  \\\n",
       "0      3460       脱皮爸爸                 院线电影,家庭关系,命运   \n",
       "1     14553    喜气洋洋小金莲  古装喜剧,剧情片,喜剧片,内地电影,欢乐喜剧,爱情纠纷   \n",
       "2      1214       风流家族            男女关系,家庭关系,命运,院线电影   \n",
       "3     30639     大提琴的故事                       短片,动画片   \n",
       "4     38522    歌舞大王齐格飞                喜剧片,人物传记,浪漫爱情   \n",
       "\n",
       "                                   video_description video_release_date  \\\n",
       "0  中年失意的儿子田力行（古天乐饰）在生活上遇到了重重危机：母亲病逝,工作不顺,妻子要求离婚。正...         2017-04-27   \n",
       "1  故事始于西门庆为西门药业的“伟哥”产品寻找代言人，西门庆初见潘金莲，一时惊为天人，为成功抱得...         2015-12-30   \n",
       "2  香世仁（钟镇涛 饰）是家财万贯的香港富豪，在满足了一切物质上的要求后，他将生活的重心放在了儿...         2002-03-07   \n",
       "3  低音大提琴演奏家史密斯科夫正要去参加某贵族的沙龙，途中他被河边的美丽景色所吸引，驻足观看。兴...         1949-01-01   \n",
       "4  罗伯特．Z．伦纳德导演的这部影片以百老汇最大的歌舞团——齐格菲歌舞团的创办人佛罗伦斯．齐格菲...         1936-04-08   \n",
       "\n",
       "                 video_director_list  \\\n",
       "0                               司徒慧焯   \n",
       "1                            杨珊珊,李亚玲   \n",
       "2                            邱礼涛,杨漪珊   \n",
       "3                         伊里·特恩卡,契诃夫   \n",
       "4  罗伯特·Z·伦纳德,William Anthony McGuire   \n",
       "\n",
       "                                    video_actor_list  video_score  \\\n",
       "0                                      吴镇宇,古天乐,春夏,蔡洁     7.398438   \n",
       "1                                 陈南飞,程隆妮,王闯,贾海涛,闫薇儿     5.601562   \n",
       "2  张家辉,卢巧音,钟镇涛,叶童,李蕙敏,张坚庭,袁洁莹,黄佩霞,齐芷瑶,刘以达,叶伟信,邹凯光...     6.800781   \n",
       "3                                                NaN          NaN   \n",
       "4  威廉·鲍威尔,玛娜·洛伊,路易丝·赖纳,弗兰克·摩根,范妮·布莱斯,弗吉尼亚·布鲁斯,雷吉纳...     7.699219   \n",
       "\n",
       "  video_second_class  video_duration  \n",
       "0           剧情,喜剧,奇幻            5913  \n",
       "1                 喜剧            6217  \n",
       "2        都市,喜剧,爱情,家庭            5963  \n",
       "3              动画,爱情           17371  \n",
       "4           剧情,歌舞,喜剧           10608  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "video_features.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 用户行为"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:24:03.875082Z",
     "iopub.status.busy": "2021-07-20T13:24:03.874557Z",
     "iopub.status.idle": "2021-07-20T13:24:10.120488Z",
     "shell.execute_reply": "2021-07-20T13:24:10.119742Z",
     "shell.execute_reply.started": "2021-07-20T13:24:03.875037Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(80276856, 9) 3953209\n",
      "(80276856, 9) 34218\n",
      "(80276856, 9) 2\n",
      "(80276856, 9) 2\n",
      "(80276856, 9) 2\n",
      "(80276856, 9) 2\n",
      "(80276856, 9) 91\n",
      "(80276856, 9) 10\n",
      "(80276856, 9) 14\n"
     ]
    }
   ],
   "source": [
    "for col in history_behavior.columns: \n",
    "    print(history_behavior.shape, history_behavior[col].nunique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:24:41.553858Z",
     "iopub.status.busy": "2021-07-20T13:24:41.553309Z",
     "iopub.status.idle": "2021-07-20T13:24:48.063561Z",
     "shell.execute_reply": "2021-07-20T13:24:48.063078Z",
     "shell.execute_reply.started": "2021-07-20T13:24:41.553826Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "422143"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(set(test_data['user_id']) & set(history_behavior['user_id']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:25:10.205084Z",
     "iopub.status.busy": "2021-07-20T13:25:10.204512Z",
     "iopub.status.idle": "2021-07-20T13:25:10.268416Z",
     "shell.execute_reply": "2021-07-20T13:25:10.267965Z",
     "shell.execute_reply.started": "2021-07-20T13:25:10.205037Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "492174"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_data['user_id'].nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:54:55.676563Z",
     "iopub.status.busy": "2021-08-15T18:54:55.675916Z",
     "iopub.status.idle": "2021-08-15T18:55:01.581983Z",
     "shell.execute_reply": "2021-08-15T18:55:01.581055Z",
     "shell.execute_reply.started": "2021-08-15T18:54:55.676514Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "history_behavior = history_behavior[history_behavior['user_id'].isin(test_data['user_id'].unique())]\n",
    "val_behavior = history_behavior[history_behavior['pt_d'] == 20210502]\n",
    "train_behavior = history_behavior[history_behavior['pt_d'] != 20210502]\n",
    "\n",
    "val_behavior = val_behavior.rename(columns={\"watch_label\": \"val_watch\", \"is_share\": \"val_share\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:55:01.583115Z",
     "iopub.status.busy": "2021-08-15T18:55:01.582820Z",
     "iopub.status.idle": "2021-08-15T18:55:09.310343Z",
     "shell.execute_reply": "2021-08-15T18:55:09.309270Z",
     "shell.execute_reply.started": "2021-08-15T18:55:01.583097Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "train_behavior = pd.merge(train_behavior, \n",
    "                          val_behavior[['user_id', 'video_id', 'val_watch', 'val_share']], \n",
    "                          on=['user_id', 'video_id'], how='left')\n",
    "\n",
    "train_behavior['val_watch'] = train_behavior['val_watch'].fillna(0)\n",
    "train_behavior['val_share'] = train_behavior['val_share'].fillna(0)\n",
    "\n",
    "train_behavior = pd.concat([\n",
    "    train_behavior[train_behavior['val_watch'] == 0].sample(50000),\n",
    "    train_behavior[train_behavior['val_watch'] != 0]\n",
    "])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:55:09.311524Z",
     "iopub.status.busy": "2021-08-15T18:55:09.311347Z",
     "iopub.status.idle": "2021-08-15T18:55:09.316939Z",
     "shell.execute_reply": "2021-08-15T18:55:09.316499Z",
     "shell.execute_reply.started": "2021-08-15T18:55:09.311508Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0    50000\n",
       "1.0     8051\n",
       "2.0     4787\n",
       "9.0     4694\n",
       "3.0     3821\n",
       "4.0     2983\n",
       "5.0     2642\n",
       "6.0     2249\n",
       "8.0     2197\n",
       "7.0     1915\n",
       "Name: val_watch, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_behavior['val_watch'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:26:06.029891Z",
     "iopub.status.busy": "2021-07-20T13:26:06.029339Z",
     "iopub.status.idle": "2021-07-20T13:26:06.085468Z",
     "shell.execute_reply": "2021-07-20T13:26:06.084881Z",
     "shell.execute_reply.started": "2021-07-20T13:26:06.029846Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "train_user_behavior_agg = train_behavior.groupby('user_id').agg({\n",
    "    'pt_d': ['count'],\n",
    "    'video_id': ['nunique'],\n",
    "    'is_watch': ['mean', 'max'],\n",
    "    'is_share': ['mean', 'max'],\n",
    "    'watch_label': ['nunique']\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:26:07.398382Z",
     "iopub.status.busy": "2021-07-20T13:26:07.397799Z",
     "iopub.status.idle": "2021-07-20T13:26:07.405755Z",
     "shell.execute_reply": "2021-07-20T13:26:07.405063Z",
     "shell.execute_reply.started": "2021-07-20T13:26:07.398335Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "train_user_behavior_agg.columns = pd.Index([e[0] + \"_\" + e[1].upper() \n",
    "                for e in train_user_behavior_agg.columns.tolist()])\n",
    "train_user_behavior_agg = train_user_behavior_agg.reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-20T13:26:08.862489Z",
     "iopub.status.busy": "2021-07-20T13:26:08.861931Z",
     "iopub.status.idle": "2021-07-20T13:26:08.905140Z",
     "shell.execute_reply": "2021-07-20T13:26:08.904690Z",
     "shell.execute_reply.started": "2021-07-20T13:26:08.862442Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>video_id</th>\n",
       "      <th>is_watch</th>\n",
       "      <th>is_share</th>\n",
       "      <th>is_collect</th>\n",
       "      <th>is_comment</th>\n",
       "      <th>watch_start_time</th>\n",
       "      <th>watch_label</th>\n",
       "      <th>pt_d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3275584</th>\n",
       "      <td>2</td>\n",
       "      <td>25469</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2021-04-18</td>\n",
       "      <td>3</td>\n",
       "      <td>20210419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5290331</th>\n",
       "      <td>2</td>\n",
       "      <td>25469</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>20210421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2993539</th>\n",
       "      <td>2</td>\n",
       "      <td>25469</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>20210502</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         user_id  video_id  is_watch  is_share  is_collect  is_comment  \\\n",
       "3275584        2     25469         1         0           0           0   \n",
       "5290331        2     25469         0         0           0           0   \n",
       "2993539        2     25469         0         0           0           0   \n",
       "\n",
       "        watch_start_time  watch_label      pt_d  \n",
       "3275584       2021-04-18            3  20210419  \n",
       "5290331              NaN            0  20210421  \n",
       "2993539              NaN            0  20210502  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "history_behavior[(history_behavior['user_id'] == 2) &\n",
    "               (history_behavior['video_id'] == 25469)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:55:18.717183Z",
     "iopub.status.busy": "2021-08-15T18:55:18.716606Z",
     "iopub.status.idle": "2021-08-15T18:55:36.795881Z",
     "shell.execute_reply": "2021-08-15T18:55:36.794635Z",
     "shell.execute_reply.started": "2021-08-15T18:55:18.717133Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn.functional as F\n",
    "from torch.autograd import Variable\n",
    "from torch import nn\n",
    "from torch.utils.data import Dataset, DataLoader\n",
    "torch.manual_seed(0)\n",
    "\n",
    "class MLP(nn.Module):\n",
    "\n",
    "    def __init__(self, n_users=5910794, n_items=50352, layers=[64, 32], dropout=False):\n",
    "        super().__init__()\n",
    "        self.user_embedding = torch.nn.Embedding(n_users, 32)\n",
    "        self.video_embedding = torch.nn.Embedding(n_items, 32)\n",
    "\n",
    "        # list of weight matrices\n",
    "        self.fc_layers = torch.nn.ModuleList()\n",
    "        for _, (in_size, out_size) in enumerate(zip(layers[:-1], layers[1:])):\n",
    "            self.fc_layers.append(torch.nn.Linear(in_size, out_size))\n",
    "        self.output_layer1 = torch.nn.Linear(layers[-1], 10)\n",
    "        self.output_layer2 = torch.nn.Linear(layers[-1], 1)\n",
    "\n",
    "    def forward(self, feed_dict):\n",
    "        users = feed_dict['user_id']\n",
    "        items = feed_dict['video_id']\n",
    "        \n",
    "        user_embedding = self.user_embedding(users)\n",
    "        video_embedding = self.video_embedding(items)\n",
    "        \n",
    "        x = torch.cat([user_embedding, video_embedding], 1)\n",
    "        for idx, _ in enumerate(range(len(self.fc_layers))):\n",
    "            x = self.fc_layers[idx](x)\n",
    "            x = F.relu(x)\n",
    "            x = F.dropout(x)\n",
    "        logit1 = self.output_layer1(x)\n",
    "        logit2 = self.output_layer2(x)\n",
    "        return logit1, logit2\n",
    "\n",
    "    def predict(self, feed_dict):\n",
    "        for key in feed_dict:\n",
    "            if type(feed_dict[key]) != type(None):\n",
    "                feed_dict[key] = torch.from_numpy(\n",
    "                    feed_dict[key]).to(dtype=torch.long, device='cpu')\n",
    "        output_scores = self.forward(feed_dict)\n",
    "        return output_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:55:36.796952Z",
     "iopub.status.busy": "2021-08-15T18:55:36.796764Z",
     "iopub.status.idle": "2021-08-15T18:55:38.638511Z",
     "shell.execute_reply": "2021-08-15T18:55:38.637858Z",
     "shell.execute_reply.started": "2021-08-15T18:55:36.796936Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([[ 0.0623,  0.5450, -0.2110, -0.1875,  0.6020,  0.0564, -0.0947,  0.2138,\n",
       "           0.2722,  0.4054],\n",
       "         [ 0.0419, -0.1215,  0.0942,  0.4227,  0.2295,  0.3764,  0.2028,  0.5010,\n",
       "           0.4307, -0.2037]], grad_fn=<AddmmBackward>),\n",
       " tensor([[-0.5280],\n",
       "         [-0.2592]], grad_fn=<AddmmBackward>))"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = MLP()\n",
    "model.predict({'user_id':np.array([10, 10]), 'video_id':np.array([10, 10])})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:55:40.085278Z",
     "iopub.status.busy": "2021-08-15T18:55:40.084705Z",
     "iopub.status.idle": "2021-08-15T18:56:24.001514Z",
     "shell.execute_reply": "2021-08-15T18:56:24.000592Z",
     "shell.execute_reply.started": "2021-08-15T18:55:40.085229Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "model = model.cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:56:28.085535Z",
     "iopub.status.busy": "2021-08-15T18:56:28.084956Z",
     "iopub.status.idle": "2021-08-15T18:56:28.095418Z",
     "shell.execute_reply": "2021-08-15T18:56:28.094409Z",
     "shell.execute_reply.started": "2021-08-15T18:56:28.085487Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "class MLPDataset(Dataset):\n",
    "    def __init__(self, history_behavior, train=True):\n",
    "        self.history_behavior = history_behavior\n",
    "        self.train = train\n",
    "        \n",
    "    def __getitem__(self, index):\n",
    "        user_id = self.history_behavior.iloc[index]['user_id']\n",
    "        video_id = self.history_behavior.iloc[index]['video_id']\n",
    "                \n",
    "        if self.train:\n",
    "            watch_label = self.history_behavior.iloc[index]['val_watch']\n",
    "            share_label = self.history_behavior.iloc[index]['val_share']\n",
    "\n",
    "            watch_label = int(watch_label)\n",
    "            share_label = int(share_label)\n",
    "            \n",
    "            return user_id, video_id, \\\n",
    "                torch.from_numpy(np.array(watch_label)), \\\n",
    "                torch.from_numpy(np.array([share_label]))\n",
    "        else:\n",
    "            return user_id, video_id\n",
    "        \n",
    "    def __len__(self):\n",
    "        return len(self.history_behavior)\n",
    "    \n",
    "train_loader = torch.utils.data.DataLoader(\n",
    "    dataset = MLPDataset(train_behavior),\n",
    "    batch_size=20, shuffle=True, num_workers=5,\n",
    ")\n",
    "\n",
    "test_loader = torch.utils.data.DataLoader(\n",
    "    dataset = MLPDataset(test_data, train=False),\n",
    "    batch_size=20, shuffle=False, num_workers=5,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T18:56:29.647141Z",
     "iopub.status.busy": "2021-08-15T18:56:29.646799Z",
     "iopub.status.idle": "2021-08-15T19:07:54.547490Z",
     "shell.execute_reply": "2021-08-15T19:07:54.546982Z",
     "shell.execute_reply.started": "2021-08-15T18:56:29.647113Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0/4167 \t 3.0385992527008057\t0.0 tensor(0., device='cuda:0')\n",
      "100/4167 \t 2.015671491622925\t0.0 tensor(10., device='cuda:0')\n",
      "200/4167 \t 1.9396910667419434\t0.0 tensor(12., device='cuda:0')\n",
      "300/4167 \t 1.4700719118118286\t0.0 tensor(17., device='cuda:0')\n",
      "400/4167 \t 2.0144922733306885\t0.0 tensor(10., device='cuda:0')\n",
      "500/4167 \t 1.876318097114563\t0.0 tensor(9., device='cuda:0')\n",
      "600/4167 \t 1.8626956939697266\t0.0 tensor(13., device='cuda:0')\n",
      "700/4167 \t 1.7240959405899048\t0.0 tensor(13., device='cuda:0')\n",
      "800/4167 \t 2.4145619869232178\t0.0 tensor(9., device='cuda:0')\n",
      "900/4167 \t 1.7381845712661743\t0.0 tensor(13., device='cuda:0')\n",
      "1000/4167 \t 1.6771737337112427\t0.0 tensor(12., device='cuda:0')\n",
      "1100/4167 \t 1.7437734603881836\t0.0 tensor(13., device='cuda:0')\n",
      "1200/4167 \t 1.844527244567871\t0.0 tensor(11., device='cuda:0')\n",
      "1300/4167 \t 1.6063880920410156\t0.0 tensor(14., device='cuda:0')\n",
      "1400/4167 \t 1.991291880607605\t0.0 tensor(12., device='cuda:0')\n",
      "1500/4167 \t 1.982926607131958\t0.0 tensor(14., device='cuda:0')\n",
      "1600/4167 \t 1.251118779182434\t0.0 tensor(17., device='cuda:0')\n",
      "1700/4167 \t 2.015336751937866\t0.0 tensor(10., device='cuda:0')\n",
      "1800/4167 \t 2.1274473667144775\t0.0 tensor(9., device='cuda:0')\n",
      "1900/4167 \t 2.0307750701904297\t0.0 tensor(9., device='cuda:0')\n",
      "2000/4167 \t 1.9513195753097534\t0.0 tensor(12., device='cuda:0')\n",
      "2100/4167 \t 1.7441174983978271\t0.0 tensor(13., device='cuda:0')\n",
      "2200/4167 \t 1.5295333862304688\t0.0 tensor(15., device='cuda:0')\n",
      "2300/4167 \t 1.976839542388916\t0.0 tensor(10., device='cuda:0')\n",
      "2400/4167 \t 1.7033637762069702\t0.0 tensor(12., device='cuda:0')\n",
      "2500/4167 \t 1.931984543800354\t0.0 tensor(12., device='cuda:0')\n",
      "2600/4167 \t 1.4458104372024536\t0.0 tensor(13., device='cuda:0')\n",
      "2700/4167 \t 1.8229187726974487\t0.0 tensor(12., device='cuda:0')\n",
      "2800/4167 \t 2.0994584560394287\t0.0 tensor(9., device='cuda:0')\n",
      "2900/4167 \t 2.450005292892456\t0.0 tensor(6., device='cuda:0')\n",
      "3000/4167 \t 1.9102333784103394\t0.0 tensor(11., device='cuda:0')\n",
      "3100/4167 \t 1.6600546836853027\t0.0 tensor(13., device='cuda:0')\n",
      "3200/4167 \t 1.6217273473739624\t0.0 tensor(13., device='cuda:0')\n",
      "3300/4167 \t 2.009052038192749\t0.10000000149011612 tensor(11., device='cuda:0')\n",
      "3400/4167 \t 1.4940425157546997\t0.0 tensor(14., device='cuda:0')\n",
      "3500/4167 \t 2.027090072631836\t0.09090909361839294 tensor(10., device='cuda:0')\n",
      "3600/4167 \t 2.0768725872039795\t0.0 tensor(10., device='cuda:0')\n",
      "3700/4167 \t 1.5510739088058472\t0.1428571492433548 tensor(14., device='cuda:0')\n",
      "3800/4167 \t 1.6096843481063843\t0.0 tensor(13., device='cuda:0')\n",
      "3900/4167 \t 1.7758731842041016\t0.0 tensor(13., device='cuda:0')\n",
      "4000/4167 \t 1.6091035604476929\t0.0 tensor(13., device='cuda:0')\n",
      "4100/4167 \t 1.6933733224868774\t0.0 tensor(14., device='cuda:0')\n",
      "0/4167 \t 2.2174713611602783\t0.0 tensor(9., device='cuda:0')\n",
      "100/4167 \t 1.906925082206726\t0.1111111119389534 tensor(10., device='cuda:0')\n",
      "200/4167 \t 2.214087963104248\t0.0 tensor(10., device='cuda:0')\n",
      "300/4167 \t 1.8731956481933594\t0.0 tensor(10., device='cuda:0')\n",
      "400/4167 \t 1.7455919981002808\t0.0 tensor(12., device='cuda:0')\n",
      "500/4167 \t 2.1751413345336914\t0.1666666716337204 tensor(7., device='cuda:0')\n",
      "600/4167 \t 2.079425096511841\t0.0 tensor(12., device='cuda:0')\n",
      "700/4167 \t 1.9789507389068604\t0.0 tensor(9., device='cuda:0')\n",
      "800/4167 \t 1.905069351196289\t0.0 tensor(12., device='cuda:0')\n",
      "900/4167 \t 1.57570219039917\t0.0 tensor(12., device='cuda:0')\n",
      "1000/4167 \t 1.677005648612976\t0.0 tensor(13., device='cuda:0')\n",
      "1100/4167 \t 1.7934160232543945\t0.0 tensor(13., device='cuda:0')\n",
      "1200/4167 \t 1.5482215881347656\t0.0 tensor(16., device='cuda:0')\n",
      "1300/4167 \t 1.793740153312683\t0.0 tensor(13., device='cuda:0')\n",
      "1400/4167 \t 1.5634182691574097\t0.2857142984867096 tensor(14., device='cuda:0')\n",
      "1500/4167 \t 1.967103362083435\t0.0 tensor(10., device='cuda:0')\n",
      "1600/4167 \t 1.3834744691848755\t0.0 tensor(15., device='cuda:0')\n",
      "1700/4167 \t 1.7580095529556274\t0.1111111119389534 tensor(12., device='cuda:0')\n",
      "1800/4167 \t 2.078411340713501\t0.0 tensor(10., device='cuda:0')\n",
      "1900/4167 \t 1.093076467514038\t0.4000000059604645 tensor(17., device='cuda:0')\n",
      "2000/4167 \t 2.2647480964660645\t0.0 tensor(8., device='cuda:0')\n",
      "2100/4167 \t 2.088059425354004\t0.0 tensor(9., device='cuda:0')\n",
      "2200/4167 \t 1.6408635377883911\t0.1428571492433548 tensor(13., device='cuda:0')\n",
      "2300/4167 \t 1.6240745782852173\t0.2222222238779068 tensor(11., device='cuda:0')\n",
      "2400/4167 \t 1.7043203115463257\t0.0 tensor(12., device='cuda:0')\n",
      "2500/4167 \t 1.3650010824203491\t0.1428571492433548 tensor(14., device='cuda:0')\n",
      "2600/4167 \t 1.2064779996871948\t0.25 tensor(15., device='cuda:0')\n",
      "2700/4167 \t 1.3761483430862427\t0.1666666716337204 tensor(14., device='cuda:0')\n",
      "2800/4167 \t 1.707099199295044\t0.125 tensor(13., device='cuda:0')\n",
      "2900/4167 \t 1.8620107173919678\t0.0 tensor(12., device='cuda:0')\n",
      "3000/4167 \t 1.324867606163025\t0.2857142984867096 tensor(15., device='cuda:0')\n",
      "3100/4167 \t 1.1015827655792236\t0.5 tensor(17., device='cuda:0')\n",
      "3200/4167 \t 1.3846371173858643\t0.3333333432674408 tensor(14., device='cuda:0')\n",
      "3300/4167 \t 1.5472047328948975\t0.1428571492433548 tensor(14., device='cuda:0')\n",
      "3400/4167 \t 2.018615961074829\t0.1666666716337204 tensor(9., device='cuda:0')\n",
      "3500/4167 \t 1.6112260818481445\t0.125 tensor(13., device='cuda:0')\n",
      "3600/4167 \t 1.3316879272460938\t0.5 tensor(14., device='cuda:0')\n",
      "3700/4167 \t 1.7145580053329468\t0.10000000149011612 tensor(11., device='cuda:0')\n",
      "3800/4167 \t 1.6623563766479492\t0.25 tensor(13., device='cuda:0')\n",
      "3900/4167 \t 1.6577060222625732\t0.0 tensor(13., device='cuda:0')\n",
      "4000/4167 \t 1.4074698686599731\t0.1666666716337204 tensor(14., device='cuda:0')\n",
      "4100/4167 \t 1.7505559921264648\t0.0 tensor(12., device='cuda:0')\n",
      "0/4167 \t 1.4648463726043701\t0.125 tensor(12., device='cuda:0')\n",
      "100/4167 \t 1.505132794380188\t0.125 tensor(12., device='cuda:0')\n",
      "200/4167 \t 1.5703130960464478\t0.1111111119389534 tensor(12., device='cuda:0')\n",
      "300/4167 \t 1.3917161226272583\t0.4000000059604645 tensor(14., device='cuda:0')\n",
      "400/4167 \t 1.372841477394104\t0.1666666716337204 tensor(14., device='cuda:0')\n",
      "500/4167 \t 1.7805626392364502\t0.25 tensor(11., device='cuda:0')\n",
      "600/4167 \t 1.003525733947754\t0.0 tensor(15., device='cuda:0')\n",
      "700/4167 \t 0.8478943705558777\t0.5714285969734192 tensor(17., device='cuda:0')\n",
      "800/4167 \t 1.263153076171875\t0.1428571492433548 tensor(13., device='cuda:0')\n",
      "900/4167 \t 1.5138713121414185\t0.4166666567325592 tensor(12., device='cuda:0')\n",
      "1000/4167 \t 1.6789288520812988\t0.0 tensor(10., device='cuda:0')\n",
      "1100/4167 \t 1.1838958263397217\t0.0 tensor(15., device='cuda:0')\n",
      "1200/4167 \t 1.7300399541854858\t0.0 tensor(11., device='cuda:0')\n",
      "1300/4167 \t 1.6119672060012817\t0.25 tensor(11., device='cuda:0')\n",
      "1400/4167 \t 1.4988842010498047\t0.30000001192092896 tensor(13., device='cuda:0')\n",
      "1500/4167 \t 0.8634194731712341\t0.3333333432674408 tensor(16., device='cuda:0')\n",
      "1600/4167 \t 1.736411452293396\t0.0 tensor(9., device='cuda:0')\n",
      "1700/4167 \t 1.9391024112701416\t0.125 tensor(13., device='cuda:0')\n",
      "1800/4167 \t 1.4707306623458862\t0.1428571492433548 tensor(14., device='cuda:0')\n",
      "1900/4167 \t 0.9352970719337463\t0.3333333432674408 tensor(16., device='cuda:0')\n",
      "2000/4167 \t 1.946927785873413\t0.2222222238779068 tensor(12., device='cuda:0')\n",
      "2100/4167 \t 1.8103584051132202\t0.1818181872367859 tensor(10., device='cuda:0')\n",
      "2200/4167 \t 1.449629306793213\t0.125 tensor(13., device='cuda:0')\n",
      "2300/4167 \t 1.0811893939971924\t0.0 tensor(13., device='cuda:0')\n",
      "2400/4167 \t 2.116546869277954\t0.10000000149011612 tensor(11., device='cuda:0')\n",
      "2500/4167 \t 1.4000545740127563\t0.25 tensor(13., device='cuda:0')\n",
      "2600/4167 \t 2.0034520626068115\t0.1666666716337204 tensor(10., device='cuda:0')\n",
      "2700/4167 \t 1.5030748844146729\t0.125 tensor(13., device='cuda:0')\n",
      "2800/4167 \t 1.3356653451919556\t0.3333333432674408 tensor(13., device='cuda:0')\n",
      "2900/4167 \t 1.5295761823654175\t0.30000001192092896 tensor(11., device='cuda:0')\n",
      "3000/4167 \t 1.4185158014297485\t0.25 tensor(14., device='cuda:0')\n",
      "3100/4167 \t 1.9200489521026611\t0.20000000298023224 tensor(12., device='cuda:0')\n",
      "3200/4167 \t 2.276984453201294\t0.09090909361839294 tensor(9., device='cuda:0')\n",
      "3300/4167 \t 1.6911227703094482\t0.25 tensor(10., device='cuda:0')\n",
      "3400/4167 \t 1.0664836168289185\t0.1666666716337204 tensor(15., device='cuda:0')\n",
      "3500/4167 \t 1.50193452835083\t0.125 tensor(13., device='cuda:0')\n",
      "3600/4167 \t 2.226026773452759\t0.0 tensor(8., device='cuda:0')\n",
      "3700/4167 \t 1.5580533742904663\t0.1666666716337204 tensor(13., device='cuda:0')\n",
      "3800/4167 \t 1.4832324981689453\t0.375 tensor(15., device='cuda:0')\n",
      "3900/4167 \t 1.279937744140625\t0.6666666865348816 tensor(16., device='cuda:0')\n",
      "4000/4167 \t 2.1321468353271484\t0.1111111119389534 tensor(12., device='cuda:0')\n",
      "4100/4167 \t 1.468871831893921\t0.2857142984867096 tensor(14., device='cuda:0')\n"
     ]
    }
   ],
   "source": [
    "wathch_loss_fn = nn.CrossEntropyLoss(weight=torch.FloatTensor([1,2,2,2,2,2,2,2,2,2]).cuda())\n",
    "shaere_loss_fn = nn.BCEWithLogitsLoss()\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.005)\n",
    "\n",
    "for _ in range(3):\n",
    "    for idx, data in enumerate(train_loader):\n",
    "        feed_dict = {\n",
    "            'user_id': data[0].long().cuda(),\n",
    "            'video_id': data[1].long().cuda(),\n",
    "        }\n",
    "        watch_label = data[2].long().cuda()\n",
    "        share_label = data[3].float().cuda()\n",
    "\n",
    "        optimizer.zero_grad()\n",
    "        wathch_pred, share_pred = model(feed_dict)\n",
    "        loss = wathch_loss_fn(wathch_pred, watch_label) + shaere_loss_fn(share_pred, share_label)\n",
    "\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        acc = ((wathch_pred.argmax(1) == watch_label)[watch_label != 0]).float().sum()\n",
    "        acc /= (watch_label != 0).float().sum()\n",
    "\n",
    "        if idx % 100 == 0:\n",
    "            print(f'{idx}/{len(train_loader)} \\t {loss.item()}\\t{acc}', (wathch_pred.argmax(1) == watch_label).float().sum())\n",
    "            # print(wathch_pred.argmax(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T19:08:33.243832Z",
     "iopub.status.busy": "2021-08-15T19:08:33.243357Z",
     "iopub.status.idle": "2021-08-15T19:08:33.265851Z",
     "shell.execute_reply": "2021-08-15T19:08:33.265276Z",
     "shell.execute_reply.started": "2021-08-15T19:08:33.243792Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([ True, False, False, False,  True,  True,  True,  True, False,  True,\n",
       "        False, False, False, False, False, False, False, False,  True],\n",
       "       device='cuda:0')"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wathch_pred.argmax(1) == watch_label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T19:08:36.924266Z",
     "iopub.status.busy": "2021-08-15T19:08:36.923662Z",
     "iopub.status.idle": "2021-08-15T19:11:04.129207Z",
     "shell.execute_reply": "2021-08-15T19:11:04.128431Z",
     "shell.execute_reply.started": "2021-08-15T19:08:36.924218Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "test_watch = []\n",
    "test_share = []\n",
    "with torch.no_grad():\n",
    "    for data in test_loader:\n",
    "        feed_dict = {\n",
    "            'user_id': data[0].long().cuda(),\n",
    "            'video_id': data[1].long().cuda(),\n",
    "        }\n",
    "        wathch_pred, share_pred = model(feed_dict)\n",
    "        \n",
    "        test_watch += list(wathch_pred.argmax(1).cpu().data.numpy())\n",
    "        test_share += list((share_pred.sigmoid() > 0.5).int().cpu().data.numpy().flatten())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T19:16:13.000656Z",
     "iopub.status.busy": "2021-08-15T19:16:13.000026Z",
     "iopub.status.idle": "2021-08-15T19:16:13.911874Z",
     "shell.execute_reply": "2021-08-15T19:16:13.910817Z",
     "shell.execute_reply.started": "2021-08-15T19:16:13.000598Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "test_data['watch_label'] = test_watch\n",
    "test_data['is_share'] = test_share"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-08-15T19:16:14.417697Z",
     "iopub.status.busy": "2021-08-15T19:16:14.417144Z",
     "iopub.status.idle": "2021-08-15T19:16:16.950691Z",
     "shell.execute_reply": "2021-08-15T19:16:16.949621Z",
     "shell.execute_reply.started": "2021-08-15T19:16:14.417651Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "test_data.to_csv('submission.csv', index=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-19T08:57:08.384850Z",
     "iopub.status.busy": "2021-07-19T08:57:08.384427Z",
     "iopub.status.idle": "2021-07-19T08:57:11.487821Z",
     "shell.execute_reply": "2021-07-19T08:57:11.486325Z",
     "shell.execute_reply.started": "2021-07-19T08:57:08.384816Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  adding: submission.csv (deflated 63%)\n"
     ]
    }
   ],
   "source": [
    "!zip submission.csv.zip submission.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "toc-autonumbering": true
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
